Skip to content

Fix rebinning of partially occupied histograms#235

Open
johannes-mueller wants to merge 4 commits intodevelopfrom
234-histogram-rebinning
Open

Fix rebinning of partially occupied histograms#235
johannes-mueller wants to merge 4 commits intodevelopfrom
234-histogram-rebinning

Conversation

@johannes-mueller
Copy link
Copy Markdown
Member

Fix #234

Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes #234 by correcting how rebin_histogram(..., binning=int) determines target binning for partially occupied MultiIndex (e.g., sparse 2D histograms), avoiding over-fine bins when rebinned.

Changes:

  • Adjust rebin_histogram MultiIndex/int handling to compute a per-dimension “global” target IntervalIndex, then subset it per group to the occupied range.
  • Update _do_rebin_histogram NaN handling by dropping NaNs once up-front (instead of per-interval).
  • Add regression tests for fully/partially occupied 2D histograms (same/up/down) and document the fix in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/pylife/utils/histogram.py Fixes bin calculation for sparse MultiIndex histograms when binning is an int; adjusts NaN filtering and a binning validity edge case.
tests/utils/test_histogram.py Adds regression coverage for sparse vs fully occupied 2D histograms across same/up/down rebin scenarios.
CHANGELOG.md Notes the bug fix under upcoming release bug fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 250 to +253
return histogram.reorder_levels(original_names)

def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 expects two blank lines between top-level function definitions. Please add an extra blank line before the new helper defs so formatting stays consistent with the rest of this module.

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +254
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper parameter name names_to_drop is misleading because the levels are not dropped; they are encoded to integer codes and later restored. Consider renaming it to something like names_to_encode/names_to_map to make the intent clearer.

Suggested change
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
def _with_range_index(hist, names_to_encode):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_encode:

Copilot uses AI. Check for mistakes.
assert rebinned.shape == (8,)


def test_rebin_irregular_1d_histogam():
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name: histogam should be histogram for clarity and discoverability.

Copilot uses AI. Check for mistakes.
pd.testing.assert_series_equal(rebinned, expected)


def test_rebin_irregular_2d_histogam():
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name: histogam should be histogram for clarity and discoverability.

Copilot uses AI. Check for mistakes.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Work around pandas-dev/pandas#64825

Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rebinning of partially occupied histogram results in too fine bins

2 participants