Skip to content

Conversation

@McKnight22
Copy link
Contributor

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#6220

What's changed and what's your intention?

Summary (mandatory):

This PR introduces export command support for Fs, S3, OSS, GCS and Azblob.

Details:

This PR refactors the export command to use the unified ObjectStoreConfig from the common module instead of introducing duplicated code logic for separate storage type in src/cli/src/data/export.rs.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@McKnight22 McKnight22 requested a review from a team as a code owner November 22, 2025 04:18
@github-actions github-actions bot added size/L docs-not-required This change does not impact docs. labels Nov 22, 2025

impl ObjectStoreConfig {
/// Builds the object store with S3.
pub fn build_s3(&self) -> Result<ObjectStore, BoxedError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why not use ·ObjectStoreConfig::build· directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the routing logic for storage types within ExportCommand::build cannot be omitted, as it relies on parsing command-line arguments to determine which storage backend type to construct.
Extracting each ObjectStoreConfig::build_xxx method allows ExportCommand::build to independently utilize the appropriate one based on the storage type being exported, eliminating the need for redundant routing checks within the main ObjectStoreConfig::build implementation. This benefit increases if future storage types extend beyond the current four required types.
It also mitigates the risk of logical inconsistency, where the storage operator returned by ObjectStoreConfig::build mismatches the storage backend constructed by ExportCommand::build.
For instance, though unlikely, if ObjectStoreConfig::build contained erroneous coding like if self.enable_s3 { self.build_oss().map(Some) ...}.
This decouples the implicit logical coupling between the two, allowing each to make independent decisions.

}
}

impl PrefixedAzblobConnection {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making all fields public in the macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered here is preserving the legacy code's private encapsulation for fields.
I can change it to public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to move this repeated logic into the macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
Done.

@WenyXu WenyXu requested a review from Copilot November 24, 2025 06:45
Copilot finished reviewing on behalf of WenyXu November 24, 2025 06:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the export command to use a unified storage configuration approach, eliminating code duplication and adding support for multiple cloud storage backends (S3, OSS, GCS, Azure Blob) alongside the existing filesystem storage.

Key changes:

  • Introduced a new storage_export module with a trait-based design for different storage backends
  • Replaced individual storage flags and configuration parameters with a unified ObjectStoreConfig
  • Added comprehensive test coverage for all storage backend configurations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/common/base/src/secrets.rs Modified Display implementation for SecretString to return empty string when secret is empty
src/cli/src/error.rs Removed unused S3ConfigNotSet error variant
src/cli/src/data/storage_export.rs New module implementing storage backend abstraction with trait-based design for Fs, S3, OSS, GCS, and Azblob
src/cli/src/data/export.rs Refactored to use unified storage config, removed duplicated operator building logic, added unit tests for all backends
src/cli/src/data.rs Added storage_export module declaration
src/cli/src/common/object_store.rs Added accessor methods for storage connection configs and split build methods for individual backends
src/cli/src/common.rs Exported new connection type aliases for use in storage backends
src/cli/Cargo.toml Added common-test-util dependency for testing
Cargo.lock Updated dependency tree with common-test-util

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 99 to 101
/// Get the SAS token.
pub fn sas_token(&self) -> Option<&String> {
self.azblob_sas_token.as_ref()
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAS token is stored as Option<String> but should be Option<SecretString> for consistency with other sensitive credentials (account_name, account_key). This exposes the SAS token in logs and debug output, creating a potential security issue.

Change the field type in AzblobConnection:

sas_token: Option<SecretString>,

And update the sas_token() getter to return Option<&SecretString>:

pub fn sas_token(&self) -> Option<&SecretString> {
    self.azblob_sas_token.as_ref()
}

Then use expose_secret() when accessing the value in the storage export code.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reject.
Because “ If a SAS token is passed over HTTP, anyone listening on the network can read the SAS token in plain text”.

Comment on lines 51 to 55
if self.expose_secret().is_empty() {
write!(f, "")
} else {
write!(f, "SecretString([REDACTED])")
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The Display implementation for SecretString returns an empty string when the secret is empty. This creates inconsistent behavior and could be confusing for debugging. When formatting an empty SecretString, it would be clearer to still show "SecretString([REDACTED])" or use "SecretString(empty)" to make it explicit that it's a SecretString type.

Consider using:

fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    write!(f, "SecretString([REDACTED])")
}

This maintains consistent formatting regardless of whether the secret is empty or not.

Suggested change
if self.expose_secret().is_empty() {
write!(f, "")
} else {
write!(f, "SecretString([REDACTED])")
}
write!(f, "SecretString([REDACTED])")

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the legacy code style is the right way.
I will revert my changes here.

Comment on lines 230 to 232
if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The endpoint validation in OssBackend::new() ensures the endpoint is not empty (line 185-192), but then in get_storage_path() there's a redundant check if !self.config.endpoint().is_empty() (line 230).

Since the endpoint is guaranteed to be non-empty after validation, this check is unnecessary and the endpoint should always be included in the connection options. Consider removing the conditional check:

connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));

This makes the code clearer and more consistent with the validation logic.

Suggested change
if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the validation.
The redundant check is fixed.

StorageType::Azblob(backend) => backend.format_output_path(catalog, file_path),
}
}

Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rustdoc comment for the public is_remote_storage() method. While the other methods in this impl block have documentation, this one doesn't.

Add a doc comment:

/// Returns true if the storage backend is remote (not local filesystem).
pub fn is_remote_storage(&self) -> bool {
Suggested change
/// Returns true if the storage backend is remote (not local filesystem).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 215 to 229
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let bucket = self.config.bucket();
let oss_path = format!("oss://{}/{}/{}/", bucket, catalog, schema);

let mut connection_options = vec![
format!(
"ACCESS_KEY_ID='{}'",
self.config.access_key_id().expose_secret()
),
format!(
"ACCESS_KEY_SECRET='{}'",
self.config.access_key_secret().expose_secret()
),
];

if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}

let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(oss_path, connection_str)
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OSS backend's get_storage_path method doesn't include the root path from the configuration, unlike the S3 and GCS backends. This means that if a user specifies --oss-root, it will be ignored in the export path generation.

The path should be:

let root = if self.config.root().is_empty() {
    String::new()
} else {
    format!("/{}", self.config.root())
};
let oss_path = format!("oss://{}{}/{}/{}/", bucket, root, catalog, schema);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root path specified by --oss-root is not required in the export path generation, so just ignore.


fn format_output_path(&self, catalog: &str, file_path: &str) -> String {
let bucket = self.config.bucket();
format!("oss://{}/{}/{}", bucket, catalog, file_path)
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format_output_path method for OSS backend doesn't include the root path from the configuration, unlike the S3, GCS, and Azblob backends. This inconsistency means the logged output path won't match the actual storage location when --oss-root is specified.

The method should format the path similarly to S3:

let root = if self.config.root().is_empty() {
    String::new()
} else {
    format!("/{}", self.config.root())
};
format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path)
Suggested change
format!("oss://{}/{}/{}", bucket, catalog, file_path)
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root path specified by --oss-root is not required in the export path generation, so just ignore.
Rejected.

Comment on lines 116 to 120
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The root path formatting logic is duplicated across S3, GCS, and Azblob backends (lines 116-120, 290-294, 390-394). Consider extracting this into a helper function to reduce duplication:

fn format_root_path(root: &str) -> String {
    if root.is_empty() {
        String::new()
    } else {
        format!("/{}", root)
    }
}

This would make the code more maintainable and consistent.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
Done.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions github-actions bot added size/XL and removed size/L labels Nov 28, 2025
@McKnight22 McKnight22 force-pushed the pr-refactor_cli_export branch from 7a18ea9 to 154a2b3 Compare November 28, 2025 06:35
McKnight22 and others added 10 commits November 30, 2025 13:30
- Utilize ObjectStoreConfig to unify storage configuration for export command
- Support export command for Fs, S3, OSS, GCS and Azblob
- Fix the Display implementation for SecretString always returned the string
  "SecretString([REDACTED])" even when the internal secret was empty.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- Change the encapsulation permissions of each configuration
  options for every storage backend to public access.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
- Update the implementation of ObjectStoreConfig::build_xxx() using macro solutions

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
- Introduce config validation for each storage type

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- Enable trait-based polymorphism for storage type handling
  (from inherent impl to trait impl)
- Extract helper functions to reduce code duplication

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- Improve SecretString handling and validation
  (Distinguishing between "not provided" and "empty string")
- Add validation when using filesystem storage

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- Refactor storage field validation with macro

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- support GCS Application Default Credentials (like GKE, Cloud Run, or local development with ) in export
  (Enabling ADC without validating  or  to be present)
  (Making  optional in GCS validation (defaults to https://storage.googleapis.com))

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
This commit refactors the validation logic for object store configurations in the CLI to leverage clap features and reduce boilerplate.

Key changes:
- Update wrap_with_clap_prefix macro to use clap's requires attribute.
  This ensures that storage-specific options (e.g., --s3-bucket) are only accepted when the corresponding backend is enabled (e.g., --s3).
- Simplify FieldValidator trait by removing the is_provided method, as dependency checks are now handled by clap.
- Introduce validate_backend! macro to standardize the validation of required fields for enabled backends.
- Refactor ExportCommand to remove explicit validation calls (validate_s3, etc.) and rely on the validation within backend constructors.
- Add integration tests for ExportCommand to verify build success with S3, OSS, GCS, and Azblob configurations.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
- Use macros to simplify storage export implementation

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
@McKnight22 McKnight22 force-pushed the pr-refactor_cli_export branch from 899028c to fa6898c Compare November 30, 2025 05:36
@McKnight22 McKnight22 requested a review from WenyXu December 1, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants