-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Summary
Implement a Bring-Your-Own-DataFrame (BYOD) strategy in TM1py to support both pandas and polars as interchangeable DataFrame backends.
Description
Currently, TM1py functions that work with tabular data rely on pandas DataFrames. To increase flexibility and performance, these functions should be able to accept either pandas or polars DataFrames as input and return the same type as output.
Both pandas and polars should be optional dependencies to keep the core installation lightweight.
Proposed Changes
-
Update all functions that handle DataFrames (e.g.,
write_dataframe,execute_view_dataframe, etc.) to:- Accept either pandas or polars DataFrames.
- Preserve the user’s chosen DataFrame type in outputs.
-
Add lightweight detection logic to determine which backend is being used.
-
Introduce optional dependencies in
setup.py(e.g.,tm1py[pandas],tm1py[polars]).
Motivation
Preliminary testing shows promising results with polars:
-
~10% faster (end to end) write operations.
-
~20% lower memory usage during large dataset handling.
This approach enables users to choose their preferred DataFrame engine without sacrificing TM1py’s ease of use.
Example
# Using pandas
df = pandas.DataFrame(...)
tm1.cubes.cells.write_dataframe(df, use_blob=True)
# Using polars
df = polars.DataFrame(...)
tm1.cubes.cells.write_dataframe(df, use_blob=True)Benefits
- Improved performance and memory efficiency for large workloads.
- Greater flexibility for developers using different DataFrame ecosystems.
- Backward compatibility with existing pandas-based code.
Next Steps
- Identify all functions currently requiring pandas DataFrames.
- Abstract common DataFrame operations (indexing, melting, etc.) to backend-neutral utilities.
- Add test coverage for both backends.
- Update documentation accordingly.