Fix rebinning of partially occupied histograms#235
Fix rebinning of partially occupied histograms#235johannes-mueller wants to merge 4 commits intodevelopfrom
Conversation
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
24e15da to
a5613c1
Compare
There was a problem hiding this comment.
Pull request overview
Fixes #234 by correcting how rebin_histogram(..., binning=int) determines target binning for partially occupied MultiIndex (e.g., sparse 2D histograms), avoiding over-fine bins when rebinned.
Changes:
- Adjust
rebin_histogramMultiIndex/int handling to compute a per-dimension “global” targetIntervalIndex, then subset it per group to the occupied range. - Update
_do_rebin_histogramNaN handling by dropping NaNs once up-front (instead of per-interval). - Add regression tests for fully/partially occupied 2D histograms (same/up/down) and document the fix in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/pylife/utils/histogram.py |
Fixes bin calculation for sparse MultiIndex histograms when binning is an int; adjusts NaN filtering and a binning validity edge case. |
tests/utils/test_histogram.py |
Adds regression coverage for sparse vs fully occupied 2D histograms across same/up/down rebin scenarios. |
CHANGELOG.md |
Notes the bug fix under upcoming release bug fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return histogram.reorder_levels(original_names) | ||
|
|
||
| def _with_range_index(hist, names_to_drop): | ||
| new_hist = hist.copy().reset_index(drop=False) |
There was a problem hiding this comment.
PEP 8 expects two blank lines between top-level function definitions. Please add an extra blank line before the new helper defs so formatting stays consistent with the rest of this module.
| def _with_range_index(hist, names_to_drop): | ||
| new_hist = hist.copy().reset_index(drop=False) | ||
| for level in names_to_drop: |
There was a problem hiding this comment.
The helper parameter name names_to_drop is misleading because the levels are not dropped; they are encoded to integer codes and later restored. Consider renaming it to something like names_to_encode/names_to_map to make the intent clearer.
| def _with_range_index(hist, names_to_drop): | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in names_to_drop: | |
| def _with_range_index(hist, names_to_encode): | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in names_to_encode: |
| assert rebinned.shape == (8,) | ||
|
|
||
|
|
||
| def test_rebin_irregular_1d_histogam(): |
There was a problem hiding this comment.
Typo in test name: histogam should be histogram for clarity and discoverability.
| pd.testing.assert_series_equal(rebinned, expected) | ||
|
|
||
|
|
||
| def test_rebin_irregular_2d_histogam(): |
There was a problem hiding this comment.
Typo in test name: histogam should be histogram for clarity and discoverability.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Work around pandas-dev/pandas#64825 Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
1568c69 to
e4ad206
Compare
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Fix #234