Delta: Updating Delta to Iceberg conversion#15407

Open

vladislav-sidorovich wants to merge 28 commits intoapache:mainfrom

vladislav-sidorovich:delta-conversion

Contributor

vladislav-sidorovich commented Feb 22, 2026 •

edited

Loading

Current PRs contains initial version of the code to update of the existing functionality: https://iceberg.apache.org/docs/1.4.3/delta-lake-migration/ to the recent Delta Lake version (read: 3, write: 7). The motivation of the PR is to receive the earliest feedback from the community.

Note: The PR doesn't remove the old logic but adds new Interface implementation, so it will be easier to compare/review. Also base on the usage scenario of the module, such approach will not introduce any issues.

The PR scope:

Support existing interface
Uses Delta Lake kernel library instead of deprecated Delta Lake standalone
Contains the basic flow
Converts all data types
Converts table schema and partitions spec
Support only INSERT operation (Delta Lake Add action)
Support UPDATES and DELETS (Delta Lake Remove action)
Support Delta VACUUM scenario
Support DVs

Future steps:

Support All Delta Lake actions
Support All Delta Lake features (column mapping, generated columns and so on)
Handle Edge cases for partitions and Generated columns
Handle Schema evolution
Incremental Conversion (from/to a specific Delta Version)

Tests:
Unit-tests: contains all supported datatypes including complex arrays and structures.
Integration-tests: contains inserts only scenario with Spark 3.5. The test must be updated for newer Delta Lake version once the previous solution will be deleted from the code.

In the following PRs, I will add all the tables from: Delta golden tables

vladislav-sidorovich added 11 commits

December 8, 2025 22:53


          Add Delta to Iceberg types conversion

1ff13bc


          Add golden Delta lake tables for tests

b84fcde


          Add table creation only from delta table source

4d8e2ce


          Add delta properties from SnapshotImpl

1dd1f0b


          Use Earliest delta version for initial Iceberg transaction

f706d2d


          add append only conversion support

afc9d70


          Add Spark conversion test

08320eb


          Fix code style

a5c0a43


          Fix code style

2d729ed


          Delete delta golden tables

987fe5c


          Update tests for inserts only conversion

cf4a590

github-actions bot added docs build labels

anoopj reviewed

View reviewed changes

Contributor

anoopj left a comment

Thank you for the PR. Moving to the Delta kernel is a great improvement. Here is my initial feedback.

...c/integration/java/org/apache/iceberg/delta/DeltaLakeToIcebergMigrationSparkIntegration.java Outdated Show resolved Hide resolved

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Outdated Show resolved Hide resolved

delta-lake/src/test/resources/delta/golden/README.md Show resolved Hide resolved

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Outdated Show resolved Hide resolved

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Outdated Show resolved Hide resolved

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Outdated Show resolved Hide resolved

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

    
              import io.delta.kernel.exceptions.TableNotFoundException;

              import io.delta.kernel.internal.DeltaHistoryManager;

              import io.delta.kernel.internal.DeltaLogActionUtils;

              import io.delta.kernel.internal.SnapshotImpl;

Contributor

anoopj Feb 24, 2026

We are using internal APIs of the kernel. This is fragile - can we refactor this to use the public APIs instead? Snapshot, Table etc. Or are we doing this because we are trying to preserve the table history during the conversion? I would try to avoid this as much as possible.

Contributor Author

vladislav-sidorovich Feb 24, 2026

No, there are no public API available for these purposes we need.

Yes, I want go through table history step by step, so we will have exactly the same granularity in the history.

At the same time it's quite safe to use an internal API because it's depends on the Delta protocol which is stable.

Contributor

anoopj Feb 24, 2026

The internal APIs can change or disappear without any notice. I would think hard about avoiding dependencies on internal APIs, including changing semantics. (e.g. not preserving all the history by default).

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

    
                    while (rows.hasNext()) {

                      Row row = rows.next();

                      if (DeltaLakeActionsTranslationUtil.isAdd(row)) {

                        AddFile addFile = DeltaLakeActionsTranslationUtil.toAdd(row);

Contributor

anoopj Feb 24, 2026

Can we avoid the use of the internal AddFile class and read fields directly from the Row using ordinals defined by the scan file schema?

Contributor Author

vladislav-sidorovich Mar 3, 2026

Yes, I will refactor this part after all the conversion features will be in place.

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java

    
                public SnapshotDeltaLakeTable deltaLakeConfiguration(Configuration conf) {

                  deltaEngine = DefaultEngine.create(conf);

                  deltaLakeFileIO = new HadoopFileIO(conf);

                  deltaTable = (TableImpl) Table.forPath(deltaEngine, deltaTableLocation);

Contributor

anoopj Feb 24, 2026

Unnecessary cast?

Contributor Author

vladislav-sidorovich Feb 24, 2026

It's necessary because I use internal API below. getChanges API is available only in TableImpl but not in the Table interface.

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Outdated Show resolved Hide resolved

anoopj reviewed

View reviewed changes

delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeKernelTableAction.java Show resolved Hide resolved

vladislav-sidorovich added 6 commits

February 25, 2026 22:19


          Fix error messages and Execption types

cdc5839


          Add license header and remove unnecessary log

f4fbea2


          fix typo

1ecbfee


          Assert Delta Lake column mapping feature before conversion

b44084f


          Handle empty tables conversion

e912f64


          Add support of update and delete operations for Delta conversion

28c9f3f

vladislav-sidorovich changed the title ~~Delta: Updating Delta to Iceberg conversion - Inserts only~~ Delta: Updating Delta to Iceberg conversion

vladislav-sidorovich added 5 commits

March 3, 2026 22:47


          make utility classes package-private

e79a555


          Support conversion from Delta tables after VACUUM operation

d7a1e64


          Add tests for snapshots

eb7cf95


          Read Delta DVs draft

5cbebc6


          Read Delta DVs draft

4dd3c8a

github-actions bot added the data label


          Support DV conversion from Delta to Iceberg

aa9f316

vladislav-sidorovich force-pushed the delta-conversion branch from 842ee68 to aa9f316 Compare

March 22, 2026 14:20

vladislav-sidorovich added 2 commits

March 22, 2026 15:32


          Rollback TestDVWriters.java

40c807d


          exclude delta _last_checkpoint from licence check

41e32bd

github-actions bot added the INFRA label

vladislav-sidorovich added 3 commits

March 22, 2026 15:43


          Merge branch 'main' into delta-conversion

fa82408


          merge rat-excludes

e860e5d


          Fix path in .rat-excludes

vladislav-sidorovich requested a review from anoopj

March 22, 2026 18:29

Contributor Author

vladislav-sidorovich commented Mar 23, 2026

@nastra since you kindly reviewed the earlier version, I'd love to get your thoughts on the updated core logic before I do the final refactoring to remove internal Delta classes.

@aokolnychyi Since you’ve contributed so much to the Deletion Vectors implementation in Iceberg, I wanted to reach out. Could you take a quick look at the DV conversion logic in my PR to make sure I’ve wired everything up correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build data docs INFRA