-
Notifications
You must be signed in to change notification settings - Fork 13
Description
I want to bring up the idea of replacing pandas with polars. I can think of three reasons why this would be beneficial:
Processing speed
polars is much faster. @khoroshevskyi has been investigating this and adoption of polars could drastically speed up the time it takes to process PEPs on the PEPhub server, enabling real-time edits to PEPs.
It's hard to find unbiased, fair comparisons especially considering the polars hype, but this post does a pretty good job highlighting some of the large improvements.
Import speed
From my own experimentation, importing polars is almost 4 times faster than importing pandas. This would work to improve things like the looper cli import issues: pepkit/looper#476
Interface with genimtools
Genimtools is native-Rust with pyo3 bindings. polars follows this model as well. Because of this, the integration of peppy objects with genimtools becomes seamless. In fact, there is an entire crate maintained by the polars group dedicated to this interface.
This sets the stage for processing PEPs and their data in genimtools, further improving server speeds for real time PEP editing. eido comes to mind as a potential bottleneck with real-time PEP editing.
Potential downsides
I think some downsides to such a switch are:
polarsis new, and not as "battle-tested" aspandas.polarsbreaks down when you want to do data visualization as libraries likematplotlibdon't natively support it.- time invested in a refactor of the sample table in
peppy