Fix: Prevent embedded_within auto-set for PSV files labeled as .csv #25
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When reading files with .csv extension, IOStreams automatically sets embedded_within: '"' to handle embedded newlines in CSV files. However, this causes errors when the file is actually pipe-delimited (PSV format) but labeled as .csv, especially when the data contains quotes (e.g., O"neil).
This fix:
Fixes issue where pipe-delimited files with .csv.pgp extension fail with 'Unbalanced delimited field' errors when quotes appear in the data.
Test coverage:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.