fix: Add schema validation for native_datafusion Parquet scan by vaibhawvipul · Pull Request #3902 · apache/datafusion-comet

vaibhawvipul · 2026-04-04T17:42:01Z

Which issue does this PR close?

Closes #3720 .

Rationale for this change

The native_datafusion Parquet scan path silently produces wrong results or misses errors when the read schema is incompatible with the actual file schema. Spark's vectorized reader throws SchemaColumnConvertNotSupportedException for these cases (e.g., reading binary as timestamp, reading a scalar as an array, decimal precision mismatches), but Comet's native scan bypassed these checks entirely.

What changes are included in this PR?

Adds per-file Parquet schema validation to both scan paths (CometNativeScanExec and CometScanExec) by reading each file's footer and checking columns against the read schema

How are these changes tested?

all expected tests in the issue, pass.

fix 3720

4ced84e

vaibhawvipul changed the title ~~fix 3720~~ fix: Add schema validation for native_datafusion Parquet scan Apr 4, 2026

vaibhawvipul mentioned this pull request Apr 4, 2026

native_datafusion: no error thrown for schema mismatch when reading Parquet with incompatible types #3720

Open

fix failing tests for spark 4.0

89b5c4c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add schema validation for native_datafusion Parquet scan#3902

fix: Add schema validation for native_datafusion Parquet scan#3902
vaibhawvipul wants to merge 2 commits intoapache:mainfrom
vaibhawvipul:vipul-issue-3720

vaibhawvipul commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaibhawvipul commented Apr 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant