Skip to content

bug(parquet): Disabling global statistics but enabling for particular column breaks reading #4587

@ozgrakkurt

Description

@ozgrakkurt

If I write files with:

.set_statistics_enabled(EnabledStatistics::None)
.set_column_statistics_enabled("block_number".into(), EnabledStatistics::Page)

When I query it with datafusion or just parquet::ParquetRecordBatchReaderBuilder, it errors with: "missing offset index"

Seems like it is skipping writing offset indices if page statistics are globally disabled?

I would expect, if it doesn't write offset indices then it shouldn't try to filter pages by statistics, also it should be documented that set_column_statistics_enabled doesn't override global settings in this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions