Skip to content

Conversation

@ScottDugas
Copy link
Collaborator

@ScottDugas ScottDugas commented Sep 9, 2025

This introduces a new KeySpacePath.importData that will import DataInKeySpacePath as gathered by KeySpacePath.exportAllData.

The new method works when importing data exported from other clusters.

Resolves: #3573
Resolves: #3751 -- I thought I was going to pull this out, but went back and resolved it with a mapPipelined cursor.

@ScottDugas ScottDugas added the enhancement New feature or request label Nov 9, 2025
@ScottDugas ScottDugas changed the title Keyspace import Introduce KeySpacePath.importData to import previously exported data Nov 9, 2025
@ScottDugas ScottDugas marked this pull request as ready for review November 10, 2025 16:08
}

// Store the data
byte[] keyBytes = keyTuple.pack();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some fdb timer metrics for future use (imported_count)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a timer around importFuture makes sense.


verifySingleKey(dataPath, Tuple.from("item"), Tuple.from("final_value"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional potential tests:

  • Large data (or any out of band error) during import
  • Import into partial path (no leaves in import data) + some remainders
  • import where data is of the wrong type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, a test of more data than can be inserted into a single transaction would make sense, but not if I move it to ResolvedKeySpacePath and just have it take a single DataInKeySpacePath.
  2. I'm not sure what you mean by a partial path.
  3. If by data you mean the value, there is no validation, and it is not KeySpacePaths responsibility to know what is in the data. If you mean the object in the path, that should be validated above this call, and should be trust-worthy by the time you get a DataInKeySpacePath. Ideally this would be validated when you create the KeySpacePath, but it is covered in the serialization work, and I explain a bit more on the situation there: https://github.com/FoundationDB/fdb-record-layer/pull/3747/files#diff-15120b2e222e6bb7c2647b670f676b719cce8602e410487604bc87e9ea30a3b0R179

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the second bullet I meant importing into the middle of the path, as when you have a path defined for /company/employee/id/profile and the import only has /company/employee

In doing this, I had to rework the test for overwriting data, and
in doing so, I decided it would be better to have 3 tests.
Now all will both run by copying back to the same cluster, and
copying between clusters.
/** The amount of time checking if a {@link com.google.common.collect.RangeSet} is empty. */
RANGE_SET_IS_EMPTY("range set is empty"),
/** The amount of time importing a single KeyValue into a path. */
IMPORT_DATA("import KeyValue"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve handling of importing with DirectoryLayerDirectory Add a KeySpacePath.import method to import the results of an export

2 participants