Conversation
for more information, see https://pre-commit.ci
|
@ludwiglierhammer: Do I need to add polars to the ci requirements/environment files? |
Yes please. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
@jtsiddons: In general, we have to read the input data line by line, as the sentinals etc can vary from line to line. I just wanted to let you konw, not that you are putting time and effort into it. |
Using polars I scan the column for the sentinal. I then separate the section (as a single column) if the column is present (values are If the sentinal is No splitting on values is performed if the sentinal is otherwise missing. Slicing on The operations are performed on the correct lines. |
|
@jtsiddons: I did some code restructuring to unify code snippets in mdf_reader and cdm_mapper. Unfortunately, I created some merge conflicts in this PR. Could you solve them or should we try to fix them together? This conflicts are only resolveable using the command line. |
No worries - I can fix the conflicts. I am working on other things this week so I'll resolve them on Monday morning. |
…name open_polars -> open_text.
…ave/write out to StringIO
|
@ludwiglierhammer: Have moved decode/convert and validate steps into a _read_loop method, meaning that only a single TextParser loop is required. This avoids significant refactor of the |
|
Current status - re-factored validators for Next steps:
edit: add extra todo step |
|
May be of some use for this: https://narwhals-dev.github.io/narwhals/ Allows for interoperability between python DataFrame libraries (e.g. polars and pandas). |
An inprogress re-write of the pandas components of
mdf_readerinto polars. This could allow for improved performance in terms of memory usage and speed.Todo:
pandas.MultiIndexfor the columns, allowingcore.YRaccess. Polars does not support this behaviour.missing_valuesand fields to ignorepandas.DataFrame