Skip to content

Conversation

@KRRT7
Copy link

@KRRT7 KRRT7 commented Oct 27, 2025

Problem

When installing datacompy in a fresh environment, the dependency resolver installs pandas==2.1.1 alongside numpy>=2.2.6, which causes a binary incompatibility error:

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

This prevents any tests from running and breaks imports.

Root Cause

The current dependency specification allows pandas>=0.25,<=2.3.3, but pandas 2.1.1 has pre-compiled C extensions that are incompatible with numpy 2.2.6. Pandas needs to be at least version 2.2.0 to work with numpy 2.2+.

Changes

  • Updated pandas minimum version constraint from >=0.25 to >=2.2.0 in pyproject.toml
  • Removed upper bound to allow flexibility with newer pandas versions
  • This ensures binary compatibility between pandas and numpy 2.x series

Testing

Verified that after this change:

  • uv sync installs pandas 2.3.3 (compatible version)
  • pytest --collect-only successfully collects 269 tests without errors
  • No import errors when loading datacompy modules

Environment Tested

  • Python: 3.10.18
  • Platform: macOS (ARM64)
  • Package manager: uv

KRRT7 added 5 commits October 27, 2025 16:36
Update pandas minimum version from 0.25 to 2.2.0 to ensure compatibility
with numpy 2.2+. Pandas versions prior to 2.2.0 have pre-compiled C
extensions that cause binary incompatibility errors when used with
numpy 2.2.6, resulting in 'numpy.dtype size changed' errors on fresh
installs.

Fixes test collection and import errors in fresh environments.
@CLAassistant
Copy link

CLAassistant commented Oct 27, 2025

CLA assistant check
All committers have signed the CLA.

@fdosani
Copy link
Member

fdosani commented Oct 31, 2025

@KRRT7 Thanks for flagging this and opening up a PR. I just made a new conda env using the following:

conda create -n test python=3.10 ipython uv pip conda openjdk=8

and then within the env I installed datacompy

> v pip install datacompy                                                                                                                   
Using Python 3.10.19 environment at: miniconda3/envs/dc_test
Resolved 12 packages in 4.56s
Prepared 1 package in 6.89s
Installed 12 packages in 174ms
 + datacompy==0.18.1
 + jinja2==3.1.6
 + markupsafe==3.0.3
 + numpy==2.2.6
 + ordered-set==4.1.0
 + pandas==2.3.3
 + polars==1.33.1
 + pyarrow==22.0.0
 + python-dateutil==2.9.0.post0
 + pytz==2025.2
 + six==1.17.0
 + tzdata==2025.2

It seems to resolve fine for me. I do know the error you mentioned, I've experienced this before myself. We are working on a new 1.0 release here:

datacompy/pyproject.toml

Lines 36 to 42 in cb82bda

dependencies = [
"jinja2>=3",
"numpy>=2,<=2.3.3",
"ordered-set>=4.0.2,<=4.1",
"pandas>=2,<=2.3.3",
"polars[pandas]>=1.19,<=1.31.1",
]

where we are bumping up releases to the latest. not sure if you have any thoughts on this but maybe end users should be pinning on their end to ensure compatible versions? @ak-gupta @rhaffar any thoughts?

fdosani added a commit that referenced this pull request Nov 13, 2025
@fdosani fdosani changed the base branch from develop to refactor/init-setup November 13, 2025 14:05
@fdosani
Copy link
Member

fdosani commented Nov 13, 2025

Capturing this change in https://github.com/capitalone/datacompy/tree/refactor/init-setup. Appreciate the PR!

@fdosani fdosani closed this Nov 13, 2025
@fdosani fdosani mentioned this pull request Nov 13, 2025
jeklein pushed a commit that referenced this pull request Nov 14, 2025
* Refactor imports to remove __init__ calls

* fix: update import paths for Fugue, Pandas, Polars, and Spark usage documentation

* fix: update numpy and polars version constraints in dependencies

* fix from #454 for binary incompatability

* chore: remove version checks and warnings.

* fix: update numpy version constraints for compatibility

* refactor: simplify Spark installation workflow by removing Pandas and NumPy versioning

* feat: add availability checks for optional dependencies in fugue, snowflake, and spark modules

* feat: implement decorators to check availability of fugue, snowflake, and spark extras

* typo: move to init

* feat: implement utility function to check module availability and refactor existing checks for fugue, snowflake, and spark

* test: add unit tests for module availability checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants