Skip to content

Conversation

@arienandalibi
Copy link
Collaborator

Takes care of #2394

Not included in the issue above:

  • Add schema parameter to load functions to support casting of columns
  • Add CSV reading options:
    • delimiter character
    • comment character
    • escape character
    • quote character
    • terminator character
    • allow_truncated_rows flag
    • has_header flag
    • more options may potentially be added

… load_edges_from_polars that internally calls to_pandas() on the polars dataframe. Fireducks works.
… in rust instead of obtaining each column of each batch from Python individually.
…g up. Committing benchmarks and tests that check graph equality when using different ingestion pathways.
…id of them and always stream. Added benchmark for loading from fireducks.
…e_props, that all use the __arrow_c_stream__() interface. If a data source is passed with no __len__ function, we calculate the len ourselves. Updated ingestion benchmarks to also test pandas_streaming, fireducks_streaming, polars_streaming
# Conflicts:
#	python/python/raphtory/__init__.pyi
…renamed to load_edge_metadata/load_node_metadata.
…if the data source provides __len__(), and if not, the loading/progress bar for loading nodes and edges doesn't show progression, only iterations per second.
…ess bar for loading updates properly when using the __arrow_c_stream__ interface.
# Conflicts:
#	python/python/raphtory/vectors/__init__.pyi
# Conflicts:
#	python/python/raphtory/graphql/__init__.pyi
#	python/python/raphtory/vectors/__init__.pyi
…ta from python. Replaced it with PyRecordBatchReader::from_arrow_pycapsule for safety and future changes.
…rrow format at once. Now stream 1 mil rows at a time.
… function on PyProperties. Added test for schema casting.
# Conflicts:
#	raphtory/src/python/graph/io/arrow_loaders.rs
…nested type using pyarrow Table. Cast whole RecordBatch at once now using StructArray.
…taTypes can be extracted from Python without feature gating behind arrow (larger dependency). Refactored data_type_as_prop_type to be in raphtory-api as long as any of "arrow", "storage", or "python" features is enabled, since they all have dep:arrow-schema.
…rison for PropType. Fixed previous tests and added tests for dict schema input, pyarrow types, nested (StructArray) properties, nested schemas, mixed and matched PropType and pyarrow types, both in property and in schema,...
# Conflicts:
#	python/python/raphtory/__init__.pyi
#	python/python/raphtory/iterables/__init__.pyi
#	python/python/raphtory/node_state/__init__.pyi
#	python/tests/test_ingestion_equivalence_df.py
#	python/tests/test_load_from_df.py
#	raphtory-api/src/python/mod.rs
#	raphtory/src/python/graph/graph.rs
#	raphtory/src/python/graph/graph_with_deletions.rs
#	raphtory/src/python/graph/io/arrow_loaders.rs
#	raphtory/src/python/packages/base_modules.rs
…d parquet/csv). Make sure each ingestion path returns the same node ids.
…lformed (or any column). Added tests for malformed inputs in csv.
…umn is not found. removed extra_field parquet test bc it didn't work. cleaned up test file.
…v optional-dependencies in pyproject.toml. General clean-up before adding other functions (load_edge, load_node_metadata, ...) in python graph.
…unctions. Added load_edges, load_node_metadata, load_edge_metadata functions to PyGraph and PyPersistentGraph. Removed Pandas loaders.
…folder which is not available in the crate root. Fixed parquet_loaders.rs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants