Skip to content

Replace pandas with polars #486

@nleroy917

Description

@nleroy917

I want to bring up the idea of replacing pandas with polars. I can think of three reasons why this would be beneficial:

Processing speed

polars is much faster. @khoroshevskyi has been investigating this and adoption of polars could drastically speed up the time it takes to process PEPs on the PEPhub server, enabling real-time edits to PEPs.

It's hard to find unbiased, fair comparisons especially considering the polars hype, but this post does a pretty good job highlighting some of the large improvements.

Import speed

From my own experimentation, importing polars is almost 4 times faster than importing pandas. This would work to improve things like the looper cli import issues: pepkit/looper#476

Interface with genimtools

Genimtools is native-Rust with pyo3 bindings. polars follows this model as well. Because of this, the integration of peppy objects with genimtools becomes seamless. In fact, there is an entire crate maintained by the polars group dedicated to this interface.

This sets the stage for processing PEPs and their data in genimtools, further improving server speeds for real time PEP editing. eido comes to mind as a potential bottleneck with real-time PEP editing.

Potential downsides

I think some downsides to such a switch are:

  • polars is new, and not as "battle-tested" as pandas.
  • polars breaks down when you want to do data visualization as libraries like matplotlib don't natively support it.
  • time invested in a refactor of the sample table in peppy

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions