feat: Add compatibility with chardet 6.0.0+ and fix encoding issues#1141
Merged
nathan-stender merged 4 commits intomainfrom Mar 12, 2026
Merged
feat: Add compatibility with chardet 6.0.0+ and fix encoding issues#1141nathan-stender merged 4 commits intomainfrom
nathan-stender merged 4 commits intomainfrom
Conversation
- Update pyproject.toml to require chardet >= 6.0.0 - Improve encoding detection logic to handle chardet 7.x behavior changes: - Add fallback to windows-1252 for very low confidence detections (<0.3) - Better handling of single-byte special characters (en dash, ®, µ) - Add BOM (Byte Order Mark) stripping for UTF-16 and UTF-8 files - Fix test data that contained mojibake from incorrect encoding detection - Corrected "®" to "®" in expected JSON files - Fixed "�" (replacement character) to proper "µ" symbol These changes ensure proper handling of various file encodings including UTF-16 LE (used by SoftMax Pro), Windows-1252, and UTF-8 with BOM. Co-Authored-By: Claude Opus 4.1 <noreply@anthropic.com>
- Changed chardet requirement from >= 6.0.0 to >= 5.2.0 to allow consumers flexibility - Enhanced encoding detection to handle differences between chardet versions: - Always try UTF-8 first when Latin-1 family encodings are detected - This prevents mojibake when chardet 5.x misdetects UTF-8 as ISO-8859-1 - Tests now pass with both chardet 5.2.0 and 7.0.1 This allows consumers to use any chardet version >= 5.2.0 without being forced to upgrade. Co-Authored-By: Claude Opus 4.1 <noreply@anthropic.com>
stephenworlow
approved these changes
Mar 12, 2026
nathan-stender
added a commit
that referenced
this pull request
Mar 17, 2026
### Added - Cytiva Biacore Insight - Add support for Affinity and Concentration analysis files (#1137) - Add compatibility with chardet 6.0.0+ and fix encoding issues (#1141) ### Fixed - Fix Perkin Elmer Envision parser to recognize A450 labels as absorbance (#1152) - Optimize test encoding detection for 4x speedup (#1143) - Fix GitHub Actions hatch/virtualenv compatibility (#1140)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
< 6.0.0to>= 6.0.0inpyproject.tomlsrc/allotropy/parsers/utils/encoding.py:\x96, registered trademark\xae, micro symbol\xb5)Background
Chardet 6.0.0 was released on February 22, 2026, followed by 7.0.x releases in March 2026. These versions introduced breaking changes in how they detect character encodings, particularly for:
Testing
🤖 Generated with Claude Code