Skip to content

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Nov 14, 2025

Purpose

Fixes URI scheme loss issue in blob-as-descriptor mode by replacing pathlib.Path with str throughout the codebase. This ensures URI schemes (e.g., oss://, s3://, file://) are preserved when handling blob descriptors and file paths.

Key Changes

Replaced pathlib.Path with str: Unified path handling to preserve URI schemes across all filesystem operations.

Impact

  • I/O & Catalog: FileIO, FileSystemCatalog, RESTCatalog - all path operations use str
  • Manifest & Metadata: DataFileMeta, SchemaManager, manifest managers - - path storage use str
  • Read/Write Operations: All readers and writers (FormatBlobReader, DataWriter, BlobWriter, etc.) - consistent str path handling
  • Table Management: FileStoreTable, SnapshotManager --table path use str
  • Blob Support: Blob, BlobDescriptor - URI scheme preservation for external blob storage

Tests

  • test_blob_write_read_end_to_end_with_descriptor: Verifies URI scheme preservation in blob descriptors
  • Sample script oss_blob_as_descriptor.py: Demonstrates OSS blob-as-descriptor functionality with scheme preservation
  • test_file_io
  • rest_token_file_io

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 14, 2025 02:18
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft November 14, 2025 03:26
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] fixes URI scheme miss issue in blob-as-descriptor mode [python] Use urlpath.URL to replace pathlib.Path to avoid scheme missing in pypaimon Nov 14, 2025
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 14, 2025 12:48
@JingsongLi
Copy link
Contributor

Maybe we can just use str.

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft November 18, 2025 03:44
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the read_blob-as-descriptor_dev branch from 5b4721f to e87ada1 Compare November 18, 2025 11:33
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the read_blob-as-descriptor_dev branch from e87ada1 to e4d22d6 Compare November 18, 2025 11:34
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 18, 2025 13:54
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft November 19, 2025 02:40
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Use urlpath.URL to replace pathlib.Path to avoid scheme missing in pypaimon [python] use str to replace pathlib.Path to avoid scheme missing in pypaimon Nov 19, 2025
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 19, 2025 03:39
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft November 19, 2025 03:40
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 19, 2025 03:48
return self._trim_schema(self.warehouse) / f"{name}{Catalog.DB_SUFFIX}"
def get_database_path(self, name) -> str:
warehouse = self.warehouse.rstrip('/')
return f"{warehouse}/{name}{Catalog.DB_SUFFIX}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use os.path.join to construct the path? then you don’t need to use rstrip to remove trailing slashes from the warehouse path

Copy link
Contributor Author

@XiaoHongbo-Hope XiaoHongbo-Hope Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use os.path.join to construct the path? then you don’t need to use rstrip to remove trailing slashes from the warehouse path

Thanks for your kind suggestion. I tried use os.path but found that os.path.join(self.warehouse, "db.db") produces oss://bucket/path\db.db on Windows instead of oss://bucket/path/db.db. So I choose str concat finally

@lucasfang
Copy link
Contributor

+1

2 similar comments
@discivigour
Copy link
Contributor

+1

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit d04ca57 into apache:master Nov 20, 2025
19 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants