Skip to content

Commit c940d14

Browse files
committed
📝 Add comparison between pandas, Polars, Dask and DuckDB
1 parent 2c863cd commit c940d14

File tree

1 file changed

+26
-1
lines changed

1 file changed

+26
-1
lines changed

docs/workspace/pandas/index.rst

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Python code. Mostly pandas is used to
2828

2929
.. tip::
3030
`Analysing data with pandas
31-
<https://cusy.io/en/our-training-courses/analysing-data-with-pandas>`_
31+
<https://cusy.io/en/our-training-courses/analysing-data-with-pandas.html>`_
3232

3333
.. seealso::
3434
* `Home
@@ -40,6 +40,31 @@ Python code. Mostly pandas is used to
4040
* `GitHub
4141
<https://github.com/pandas-dev/pandas/>`_
4242

43+
pandas vs. Polars vs. Dask and DuckDB
44+
-------------------------------------
45+
46+
The choice between pandas, `Polars <https://pola.rs>`_,
47+
:doc:`/performance/dask`, and `DuckDB <https://duckdb.org>`_ depends on the type
48+
of workload:
49+
50+
pandas
51+
is the canonical Python DataFrame library for analysis on a single machine.
52+
Polars
53+
is written in Rust and allows for powerful analysis on a single node or when
54+
`lazy evaluation <https://en.wikipedia.org/wiki/Lazy_evaluation>`_ and
55+
`expressions API
56+
<https://docs.pola.rs/api/python/stable/reference/expressions/index.html>`_
57+
are important.
58+
Dask
59+
is a Python library for parallel computing that scales familiar APIs,
60+
including pandas and `Scikit-Learn <https://scikit-learn.org/stable/>`_, to
61+
clusters.
62+
DuckDB
63+
is an in-process `OLAP
64+
<https://en.wikipedia.org/wiki/Online_analytical_processing>`_ database for
65+
analysis and SQL over **local** files, which often complements pandas
66+
DataFrames as it is excellent for in-process analysis and SQL tasks.
67+
4368
.. toctree::
4469
:hidden:
4570
:titlesonly:

0 commit comments

Comments
 (0)