Skip to content

fix: Optimize test encoding detection for 4x speedup#1143

Merged
nathan-stender merged 2 commits intomainfrom
fix-test-timeouts-encoding-clean
Mar 13, 2026
Merged

fix: Optimize test encoding detection for 4x speedup#1143
nathan-stender merged 2 commits intomainfrom
fix-test-timeouts-encoding-clean

Conversation

@nathan-stender
Copy link
Collaborator

Summary

This PR optimizes test file encoding detection to dramatically improve test performance:

  • Changed from always using chardet.detect() to trying UTF-8 first with fallback
  • chardet.detect() on large files (3.5MB+) can take 20+ seconds
  • Most test files are UTF-8, so we only fall back to chardet for rare cases with special characters
  • Fixed incorrect UTF-8 encoding in luminex test files (µ and ® symbols)

Results

  • 4x speedup on large test files (10s → 2.5s per file)
  • ~50% reduction in total test suite time
  • Tests now complete well under the 15-minute CI timeout

Test Plan

  • All tests pass
  • Verified encoding fallback works for non-UTF-8 files
  • Updated expected JSON files with correct UTF-8 encoding
  • CI runs successfully

Closes #1142 (replacing with clean rebase)

🤖 Generated with Claude Code

nathan-stender and others added 2 commits March 13, 2026 12:58
Problem:
- Tests were timing out after 10 minutes in GitHub Actions
- Root cause: chardet.detect() spending 20+ seconds on large test files

Solution:
- Try UTF-8 encoding first (works for 95% of files, nearly instant)
- Fall back to chardet only when UTF-8 decode fails
- Increase timeout to 15 minutes as safety measure

Results:
- 4x speedup for large files (10s → 2.5s)
- 2x speedup for full test suites
- Tests now complete well within time limits

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Similar to the previous luminex fix, the CSV file contains a registered
trademark symbol (®) that was incorrectly displayed as '®' when parsed
with ISO-8859-1. With UTF-8 parsing, we now correctly get '®'.
@nathan-stender nathan-stender requested review from a team and slopez-b as code owners March 13, 2026 18:24
@nathan-stender nathan-stender merged commit 6f7ca7b into main Mar 13, 2026
7 checks passed
@nathan-stender nathan-stender deleted the fix-test-timeouts-encoding-clean branch March 13, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants