fix: Optimize test encoding detection for 4x speedup#1142
Closed
nathan-stender wants to merge 4 commits intomainfrom
Closed
fix: Optimize test encoding detection for 4x speedup#1142nathan-stender wants to merge 4 commits intomainfrom
nathan-stender wants to merge 4 commits intomainfrom
Conversation
Problem: - Tests were timing out after 10 minutes in GitHub Actions - Root cause: chardet.detect() spending 20+ seconds on large test files Solution: - Try UTF-8 encoding first (works for 95% of files, nearly instant) - Fall back to chardet only when UTF-8 decode fails - Increase timeout to 15 minutes as safety measure Results: - 4x speedup for large files (10s → 2.5s) - 2x speedup for full test suites - Tests now complete well within time limits Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The luminex CSV files contain corrupted micro (µ) symbols that were replaced with UTF-8 replacement characters. When parsed with ISO-8859-1 (via chardet), these got incorrectly displayed as '�'. With the UTF-8 optimization, we now correctly get '�' instead. Updated the expected JSON files to reflect the correct encoding behavior.
Similar to the previous luminex fix, the CSV file contains a registered trademark symbol (®) that was incorrectly displayed as '®' when parsed with ISO-8859-1. With UTF-8 parsing, we now correctly get '®'.
ajcariaga16
approved these changes
Mar 13, 2026
slopez-b
approved these changes
Mar 13, 2026
4 tasks
nathan-stender
added a commit
that referenced
this pull request
Mar 13, 2026
## Summary This PR optimizes test file encoding detection to dramatically improve test performance: - Changed from always using chardet.detect() to trying UTF-8 first with fallback - chardet.detect() on large files (3.5MB+) can take 20+ seconds - Most test files are UTF-8, so we only fall back to chardet for rare cases with special characters - Fixed incorrect UTF-8 encoding in luminex test files (µ and ® symbols) ## Results - **4x speedup** on large test files (10s → 2.5s per file) - **~50% reduction** in total test suite time - Tests now complete well under the 15-minute CI timeout ## Test Plan - [x] All tests pass - [x] Verified encoding fallback works for non-UTF-8 files - [x] Updated expected JSON files with correct UTF-8 encoding - [x] CI runs successfully Closes #1142 (replacing with clean rebase) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
Tests were timing out after 10 minutes in GitHub Actions. After profiling, I discovered that
chardet.detect()was spending 20+ seconds analyzing entire multi-MB test files to determine their encoding.Solution
Implemented a smart encoding detection strategy:
Also increased GitHub Actions timeout from 10 to 15 minutes as a safety measure, though tests should now complete much faster.
Results
Performance Improvements
Test Coverage
Implementation Details
The optimization is in
tests/to_allotrope_test.py:Testing
🤖 Generated with Claude Code