Skip to content

Conversation

@ollietulloch
Copy link
Contributor

This PR fixes two bugs.

  1. vcf_file_metadata was not defined as an attr_reader on Vcf::Table. Vcf::Table updated accordingly.
  2. Where a column mapping was defined using a regex, the @columns remained mutated after the table was transformed, so would then cause a header row error on subsequent files.

The fix was to reset @columns back to its original state after the table has been transformed. We also ensure that @masked_mappings are reculated if @columns have been mutated. The test added in UniversalImporterHelperTest proves that mutated @columns do not persist beyond the transforming of a table.

Copy link
Contributor

@bshand bshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs an ensure block, or a little reworking to simplify the logic. Further details in the code comment.

Comment on lines 54 to 68
# Keep a copy of the original column mappings for use later if columns are mutated
original_columns = @columns.deep_dup
@columns_mutated = false
last_col = last_column_to_transform
skip_footer_lines(lines, footer_lines).each do |line|
line.is_a?(Array) ? process_line(line[0..last_col], &block) : process_line(line, &block)
end

# @columns may have been mutated where column is a regular expression.
# We want to restore `@columns` back to its original state, so the column regexp
# will work on the next file
@columns = original_columns if @columns_mutated
# Also ensure that @masked_mappings are recalculated if @columns has been mutated
@masked_mappings = nil if @columns_mutated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels clunky.

Can we rather set @original_columns = @columns in the initializer, and then always set @columns = @original_columns.deep_dup in this method?

Alternatively, the last block (lines 62-67) should be in an ensure block, in case the earlier code fails.

Is it worth keeping the @columns_mutated flag? It seems like a lot of extra complexity for a small performance optimisation. Do we ever call #transform inside a tight loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bshand - some changes made in e72a86a

  • Removed @columns_mutated flag
  • Set @original_columns in the initializer.

and then always set @columns = @original_columns.deep_dup in this method?

I needed to use .deep_dup in the initializer when setting @original_columns, otherwise @original_columns was mutated when @columns was mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is breaking some downstream tests when pointing at this branch, I'll investigate

@ollietulloch ollietulloch marked this pull request as draft June 2, 2025 11:03
# @columns may have been mutated where column is a regular expression.
# We want to restore `@columns` back to its original state, so the column regexp
# will work on the next file
@columns = @original_columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be @columns = @original_columns.deep_dup otherwise it will work twice but not 3 times.

Maybe also move this to immediately after @notifier.try(:started) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've moved it up to @notifier.try(:started) locally, but lot of tests are erroring.

Despite instance_variable_set("@#{key}", options[key], it seems like @columns is not set in the initializer, so @original_columns is nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A downstream issue appears to be loading NdrImport::Table classes via YAML; this doesn't hit the initializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants