Skip to content

write_dataframe_async with increment=True should aggregate before splitting dataframe #1307

@MariusWirtz

Description

@MariusWirtz

When using write_dataframe_async with increment=True, the aggregation/incrementation should happen before the dataframe is split into smaller chunks for parallel processing.

Currently, the aggregation appears to happen per chunk, meaning that if the original dataframe contains duplicate records (same intersection), these duplicates might be distributed across different chunks. As a result, parallel writes can overwrite each other instead of properly incrementing values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions