Skip to content

fix: handle UnicodeDecodeError gracefully when reading files#522

Open
KushalLukhi wants to merge 5 commits intobndr:masterfrom
KushalLukhi:fix/unicode-decode-error
Open

fix: handle UnicodeDecodeError gracefully when reading files#522
KushalLukhi wants to merge 5 commits intobndr:masterfrom
KushalLukhi:fix/unicode-decode-error

Conversation

@KushalLukhi
Copy link

This PR fixes issue #469 - Unicode Decode Error when scanning files with non-utf-8 encodings.

Problem:

When pipreqs encounters files encoded with non-utf-8 encodings (e.g., latin-1), it crashes with a UnicodeDecodeError.

Solution:

  • Added try-except block in read_file_content() to catch UnicodeDecodeError
  • Falls back to latin-1 encoding if utf-8 fails
  • Logs warning when fallback encoding is used
  • Returns empty string if both encodings fail instead of crashing
  • Added test to verify graceful handling of non-utf-8 encoded files

Fixes:

This PR addresses two related issues:

Fixes bndr#485 - --ignore-errors flag not working with notebooks:

- ipynb_2_py now catches exceptions and logs warnings

- read_file_content raises ValueError when notebook parsing fails

- Errors now properly propagate to the ignore_errors handler

Fixes bndr#494 - SyntaxError with Python 2 syntax:

- Added better SyntaxError handling in get_all_imports

- Provides helpful warning message about Python 2 syntax

- Suggests using --ignore-errors flag when SyntaxError occurs

Changes:

- Modified ipynb_2_py to catch exceptions and return None

- Modified read_file_content to raise ValueError on notebook failures

- Enhanced error handling in get_all_imports for syntax errors

- Added tests for ignore_errors with invalid notebooks and Python 2 syntax

Testing:

- Added test_ignore_errors_with_invalid_notebook

- Added test_ignore_errors_with_syntax_error
Fixes issue bndr#491 - libraries with hyphens in the name were incorrectly
mapped with underscores instead of hyphens.

- Changed sklearn mapping from scikit_learn to scikit-learn
- Added skimage mapping to scikit-image
- Added test to verify hyphenated package names are correctly mapped
Fixes issue bndr#469 - Unicode Decode Error when scanning files with
non-utf-8 encodings (e.g., latin-1).

Changes:
- Added try-except block in read_file_content() to catch UnicodeDecodeError
- Falls back to latin-1 encoding if utf-8 fails
- Logs warning when fallback encoding is used
- Returns empty string if both encodings fail instead of crashing
- Added test to verify graceful handling of non-utf-8 encoded files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unicode Decode Error

1 participant