Skip to content

Conversation

@getumen
Copy link
Owner

@getumen getumen commented Dec 16, 2025

This pull request introduces support for file deletion in the distributed file system (DFS) and adds a new S3-compatible server component. The main changes include implementing the delete_file operation across the client and master components, updating the Raft state machine, and integrating a new s3_server crate for S3 REST API compatibility. Additionally, the build system and documentation are updated to reflect these enhancements.

S3 Server Integration and S3 API Progress

  • Added a new dfs/s3_server crate, including its dependencies and a basic main.rs to handle S3-compatible REST API requests using Axum. This enables S3 compatibility for the DFS. [1] [2]
  • Updated the workspace and Dockerfile to include and build the new s3_server binary. [1] [2]
  • Marked major S3 API milestones as completed in TODO.md, reflecting significant progress on S3 compatibility features and integration testing. [1] [2] [3]

File Deletion Support

  • Implemented the delete_file RPC in the DFS client (dfs/client/src/mod.rs), including error handling and integration with sharding and master routing logic. [1] [2]
  • Added the delete_file gRPC endpoint to the master server (dfs/metaserver/src/master.rs) and integrated it into the Raft state machine for consensus-based file deletion. [1] [2] [3] [4]

Client API Improvements

  • Updated client methods to return error types compatible with async error propagation (Send + Sync), and added a list_all_files method to aggregate file listings across shards. [1] [2] [3] [4] [5]

These changes collectively enhance the DFS by enabling S3 interoperability and allowing users to delete files in a distributed, fault-tolerant manner.

Copilot AI review requested due to automatic review settings December 16, 2025 05:37
@getumen
Copy link
Owner Author

getumen commented Dec 16, 2025

Integration Test failed

Waiting for S3 Server (port 9000)...
Waiting... (30)
Connection to localhost port 9000 [tcp/cslistener] succeeded!
S3 Server is ready!
Running Integration Test...
=== TEST FAILED (Exit Code: 1) ===
--- Python Test Output ---
/home/yoshihiro/.asdf/installs/python/3.9.25/lib/python3.9/site-packages/boto3/compat.py:89: PythonDeprecationWarning: Boto3 will no longer support Python 3.9 starting April 29, 2026. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.10 or later. More information can be found here: https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/
warnings.warn(warning, PythonDeprecationWarning)
Traceback (most recent call last):
File "/home/yoshihiro/github/getumen/rust-hadoop/test_scripts/s3_integration_test.py", line 192, in main
test_list_objects_v2(s3)
File "/home/yoshihiro/github/getumen/rust-hadoop/test_scripts/s3_integration_test.py", line 155, in test_list_objects_v2
assert resp.get('IsTruncated'), "Should be truncated"
AssertionError: Should be truncated
--- Testing Bucket Operations ---
Creating bucket: test-bucket-4ff757c3-27b8-49f2-9e6b-aa3b9a6aaf99
Bucket created successfully.
Listing buckets:
Buckets found: ['test-bucket-4ff757c3-27b8-49f2-9e6b-aa3b9a6aaf99']

--- Testing Object Operations ---
Putting object 'test-object.txt' with metadata.
Getting object 'test-object.txt'.
Content: b'Hello S3 Server!'
Metadata: {'user-type': 'integration-test', 'version': '1.0'}

--- Testing Range Requests ---
Requesting range bytes=0-4
Partial Content: b'01234'

--- Testing Multipart Upload ---
Initiating Multipart Upload
Upload ID: 61eecffb-5571-4f5c-a637-c59ce2631d39
Uploading Part 1
Uploading Part 2
Uploading Part 3
Completing Multipart Upload
Multipart Upload Completed
Multipart Upload failed: An error occurred (404) when calling the HeadObject operation: Not Found
Multipart Test Failed (Optional?): An error occurred (404) when calling the HeadObject operation: Not Found

--- Testing ListObjectsV2 ---
Listing with MaxKeys=10
Keys returned: 10

FAILURE! Test failed with error: Should be truncated

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces S3-compatible REST API support for the distributed file system (DFS) through a new s3_server component, alongside implementing file deletion functionality across the DFS architecture.

Key Changes:

  • Added a new dfs/s3_server crate that provides S3-compatible REST API endpoints using Axum, enabling standard S3 clients to interact with the DFS
  • Implemented delete_file RPC operation across the client, master server, and Raft state machine to support file deletion with consensus
  • Enhanced client API with list_all_files method for aggregating file listings across multiple shards

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test_scripts/s3_integration_test.py Python-based integration test suite for S3 API operations including bucket management, object operations, multipart uploads, and pagination
test_scripts/run_s3_test.sh Shell script to orchestrate S3 integration testing with Docker Compose setup and teardown
test_scripts/requirements.txt Python dependencies for S3 integration tests (boto3, botocore)
proto/dfs.proto Added DeleteFile RPC definition to the MasterService protocol
docker-compose.yml Added S3 server service configuration to the Docker Compose setup
dfs/s3_server/src/state.rs Application state management for S3 server holding DFS client instance
dfs/s3_server/src/s3_types.rs S3 XML response type definitions for bucket and object operations
dfs/s3_server/src/main.rs S3 server entry point with Axum router configuration and DFS client initialization
dfs/s3_server/src/handlers.rs S3 API request handlers implementing bucket operations, object CRUD, multipart uploads, and metadata handling
dfs/s3_server/Cargo.toml Cargo manifest for S3 server dependencies
dfs/metaserver/src/simple_raft.rs Added DeleteFile command to Raft state machine
dfs/metaserver/src/master.rs Implemented delete_file RPC endpoint with Raft consensus integration
dfs/client/src/mod.rs Added delete_file and list_all_files methods, updated error types to support async Send + Sync
TODO.md Updated S3 API milestones to reflect completed implementation status
Dockerfile Added s3-server binary to Docker image
Cargo.toml Added s3_server to workspace members
Comments suppressed due to low confidence (1)

dfs/s3_server/src/handlers.rs:1

  • Use the tracing framework (already imported) instead of eprintln! for consistency with the rest of the codebase. Replace with tracing::error! or tracing::warn!.
use crate::{s3_types::*, state::AppState as S3AppState};

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +82 to +83
part1_data = b"Part1" * 1024 * 1024 # 5MB
part2_data = b"Part2" * 1024 * 1024 # 5MB
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states '5MB' but the calculation produces 5,242,880 bytes (approximately 5.24 MB). For exactly 5MB, use b"Part1" * (1024 * 1024) which equals 5,242,880 bytes, or clarify the comment to indicate the approximate size.

Suggested change
part1_data = b"Part1" * 1024 * 1024 # 5MB
part2_data = b"Part2" * 1024 * 1024 # 5MB
part1_data = b"Part1" * 1024 * 1024 # 5,242,880 bytes (~5.24MB)
part2_data = b"Part2" * 1024 * 1024 # 5,242,880 bytes (~5.24MB)

Copilot uses AI. Check for mistakes.
key,
last_modified: "2025-01-01T00:00:00.000Z".into(),
etag: "\"000\"".into(),
size: 0,
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object size is hardcoded to 0, which provides incorrect information to S3 clients. For regular files, retrieve the actual file size from the DFS. For multipart upload objects, calculate the size by summing the sizes of all parts.

Copilot uses AI. Check for mistakes.
key,
last_modified: "2025-01-01T00:00:00.000Z".into(),
etag: "\"000-MPU\"".into(),
size: 0, // Calculate size?
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPU object size is hardcoded to 0. The comment acknowledges this needs calculation. Implement size calculation by listing and summing part sizes, as this affects S3 client behavior and data integrity checks.

Copilot uses AI. Check for mistakes.
key: key.clone(),
last_modified: "2025-01-01T00:00:00.000Z".into(),
etag: "\"000\"".into(),
size: 0,
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object size is hardcoded to 0 in the ListObjectsV2 response. Retrieve and return the actual file size from the DFS to provide accurate information to S3 clients.

Copilot uses AI. Check for mistakes.
.await;

let result = CompleteMultipartUploadResult {
location: format!("http://localhost:9000/{}/{}", bucket, key),
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location URL is hardcoded to 'http://localhost:9000'. This breaks in non-local deployments. Use the actual request host or make the base URL configurable via environment variable.

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +227
// Parse body for part verification (skip actual verification for now, just trust client)
if let Ok(str_body) = std::str::from_utf8(&body) {
let _parts: Result<CompleteMultipartUpload, _> = from_str(str_body);
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CompleteMultipartUpload handler skips ETag verification of parts. Without verification, corrupted or tampered parts could be assembled into the final object. Implement ETag verification to ensure data integrity.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants