Skip to content

Conversation

@mjcloutier
Copy link
Contributor

When reading files with .csv extension, IOStreams automatically sets embedded_within: '"' to handle embedded newlines in CSV files. However, this causes errors when the file is actually pipe-delimited (PSV format) but labeled as .csv, especially when the data contains quotes (e.g., O"neil).

This fix:

  • Checks the detected or explicitly set format before auto-setting embedded_within
  • If format is :psv (even if file is named .csv), embedded_within is not set
  • Allows explicit embedded_within: nil to disable quote parsing
  • Uses :auto as default to distinguish between 'not provided' and 'explicitly nil'

Fixes issue where pipe-delimited files with .csv.pgp extension fail with 'Unbalanced delimited field' errors when quotes appear in the data.

Test coverage:

  • Added test file with pipe-delimited data containing quotes
  • Added tests to verify the production scenario fails correctly
  • Added tests to verify both fix options work (embedded_within: nil and format: :psv)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

When reading files with .csv extension, IOStreams automatically sets
embedded_within: '"' to handle embedded newlines in CSV files. However,
this causes errors when the file is actually pipe-delimited (PSV format)
but labeled as .csv, especially when the data contains quotes (e.g., O"neil).

This fix:
- Checks the detected or explicitly set format before auto-setting embedded_within
- If format is :psv (even if file is named .csv), embedded_within is not set
- Allows explicit embedded_within: nil to disable quote parsing
- Uses :auto as default to distinguish between 'not provided' and 'explicitly nil'

Fixes issue where pipe-delimited files with .csv.pgp extension fail with
'Unbalanced delimited field' errors when quotes appear in the data.

Test coverage:
- Added test file with pipe-delimited data containing quotes
- Added tests to verify the production scenario fails correctly
- Added tests to verify both fix options work (embedded_within: nil and format: :psv)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant