Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Jan 6, 2026

Which issue does this PR close?

N/A

Rationale for this change

Profiling Iceberg native scans revealed significant overhead in async stream polling, particularly:

  • tokio::drop_waker and tokio::park::clone consuming substantial time in IcebergStreamWrapper::poll_next
  • futures_util::stream::flatten_unordered::SharedPollState::{start_polling,stop_polling} showing lock contention

I think this is due to:

  1. Per-batch schema adapter allocation: Created SparkParquetOptions, SparkSchemaAdapterFactory, and schema adapters for every single batch via .and_then() combinator
  2. Competing parallelization logic: IcebergFileStream passed one FileScanTask at a time to iceberg-rust, causing flatten_unordered to coordinate parallelization across a single task (pure overhead). Stream nesting created excessive waker churn.

What changes are included in this PR?

  • Cache schema adapters and Parquet options
  • Remove IcebergFileStream and pass all FileScanTasks directly to iceberg-rust. I tried this in the past but I can't remember why I abandoned it. Let's try again.

How are these changes tested?

Existing tests.

@mbutrovich mbutrovich marked this pull request as draft January 6, 2026 22:41
@codecov-commenter
Copy link

codecov-commenter commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.47%. Comparing base (f09f8af) to head (f17099a).
⚠️ Report is 834 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3051      +/-   ##
============================================
+ Coverage     56.12%   59.47%   +3.34%     
- Complexity      976     1382     +406     
============================================
  Files           119      167      +48     
  Lines         11743    15512    +3769     
  Branches       2251     2575     +324     
============================================
+ Hits           6591     9226    +2635     
- Misses         4012     4989     +977     
- Partials       1140     1297     +157     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Conflicts:
#	native/core/src/execution/operators/iceberg_scan.rs
@mbutrovich mbutrovich changed the title fix: [WIP] [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization Jan 7, 2026
@mbutrovich mbutrovich marked this pull request as ready for review January 9, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants