Skip to content

[Reporting][DataFiltering] Expand data padding to full simulation range#2867

Open
ew3361zh wants to merge 21 commits intodevfrom
expand-data-fix-2
Open

[Reporting][DataFiltering] Expand data padding to full simulation range#2867
ew3361zh wants to merge 21 commits intodevfrom
expand-data-fix-2

Conversation

@ew3361zh
Copy link
Collaborator

New default behavior for data padding is to pad data across the entire simulation length.

Context

Issue(s) closed by this pull request: closes #2725

What

  1. New default data padding behavior to pad to full simulation length.
  2. New option to pad data prior to first observed report of data use_fill_value_before_start.

Why

Repeated confusion from users on data not being padded as expected if the reporting frequency is not daily.

How

  • Added default behavior of the Utility expand_data_temporally() is to now pad the data for the full length of the simulation. This will allow any data reported to more easily match the dataset length of the variables reported daily.
  • Added option to turn this behavior off by using the expand_data_to_observed_range filter option set to true.
  • Added option to pad data before the first day the data is reported (limit to start of simulation) just in case the first day this variable is reported is not the first day of the simulation. The behavior here will take the first observed value and add that same value to the front of the padded dataset.
  • The option of whether or not to use a prescribed fill value is still available for the start, gaps, and end of the dataset (each separately or together).

Test plan

I tried making an exhaustive filter to see the behavior:

{
  "multiple": [
  {
    "name": "Expand to Observed (Info Maps) Range - no fill value",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "expand_data_to_observed_range": true,
    "use_fill_value_before_start": false,
    "use_fill_value_in_gaps": false,
    "use_fill_value_at_end": false
  },
  {
    "name": "Expand full simulation - fill everywhere",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "fill_value": "FILL",
    "use_fill_value_before_start": true,
    "use_fill_value_in_gaps": true,
    "use_fill_value_at_end": true
  },
  {
    "name": "Expand full simulation - fill everywhere - no fill value provided",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "use_fill_value_before_start": true,
    "use_fill_value_in_gaps": true,
    "use_fill_value_at_end": true
  },
  {
    "name": "Expand full simulation - use carried values everywhere",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "use_fill_value_before_start": false,
    "use_fill_value_in_gaps": false,
    "use_fill_value_at_end": false
  },
  {
    "name": "Expand full simulation - fill start only",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "fill_value": "FILL",
    "use_fill_value_before_start": true,
    "use_fill_value_in_gaps": false,
    "use_fill_value_at_end": false
  },
  {
    "name": "Expand full simulation - fill gaps only",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "fill_value": "FILL",
    "use_fill_value_before_start": false,
    "use_fill_value_in_gaps": true,
    "use_fill_value_at_end": false
  },
  {
    "name": "Expand full simulation - fill end only",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ],
    "expand_data": true,
    "fill_value": "FILL",
    "use_fill_value_before_start": false,
    "use_fill_value_in_gaps": false,
    "use_fill_value_at_end": true
  },
  {
    "name": "No expansion",
    "filters": [
      "Feed.Silage.Bunker.corn_silage_storage_1.total_dry_matter_mass"
    ]
  }
  ]
}

Input Changes

Output Changes

  • N/A

Filter

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on expand-data-fix-2 branch: 1213
Mypy errors on dev branch: 1213
No difference in error counts

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on expand-data-fix-2 branch: 1213
Mypy errors on dev branch: 1213
No difference in error counts

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on expand-data-fix-2 branch: 1213
Mypy errors on dev branch: 1213
No difference in error counts

filtered_simulation_days = sorted(set(all_simulation_days))

first_day = filtered_simulation_days[0] if expand_data_to_observed_range else 1
last_day = filtered_simulation_days[-1] if expand_data_to_observed_range else simulation_length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
last_day = filtered_simulation_days[-1] if expand_data_to_observed_range else simulation_length
last_day = filtered_simulation_days[-1] if expand_data_to_observed_range else simulation_length - 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the RuFaSTime.simulation_length_days is incorrectly calculated, it should be self.simulation_length_days: int = (self.end_date - self.start_date).days + 1 instead.
This simulation_length_days is also used in RuFaSTime.convert_slice_to_simulation_day() function, which gets used in GraphGenerator._draw_graph() to determine the slice start and end. Please also evaluate if we need to update line 165 in RuFaSTime.convert_slice_to_simulation_day() function to remove the extra "+1"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Allister. I think after our discussion at the WT this morning that this is correct so I will make these changes. Good catch!


first_known_value, first_known_info_map = indexed_data[first_day_of_original_data]
last_known_value = fill_value
last_known_info_map = {"simulation_day": 0, "units": original_units}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this last_known_info_map gets used before it gets overwritten, we may not have other information in info_map (like "prefix"). Please look into this!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I added a fix for this where the original info_map is preserved similar to how it was in the previous version of the function.

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on expand-data-fix-2 branch: 1213
Mypy errors on dev branch: 1213
No difference in error counts

@ew3361zh ew3361zh requested a review from allisterakun March 18, 2026 13:05
@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on expand-data-fix-2 branch: 1213
Mypy errors on dev branch: 1213
No difference in error counts

Copy link
Collaborator

@allisterakun allisterakun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you Niko! Great improvements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expand_data_temporally doesn't expand values to end of simulation

2 participants