Skip to content

Conversation

@Mahalaxmibejugam
Copy link
Contributor

@Mahalaxmibejugam Mahalaxmibejugam commented Dec 19, 2025

rmdir method provides an implementation for deletion of empty directories for Hierarchical Namespace (HNS) enabled buckets.

  • HNS-Aware Deletion: For HNS-enabled buckets, it uses the delete_folder API to perform an atomic deletion of the specified empty folder.
  • Fallback Mechanism: If a bucket is not HNS-enabled or if the path refers to the bucket itself (with no folder specified), it reverts to the GCSFileSystem _rmdir implementation.
  • Error Handling: It translates GCS-specific API exceptions into standard Python errors for fsspec compatibility
    1. NotFound is raised as FileNotFoundError. NotFound would be thrown by the API if the path is a file or if it doesn't exist.
    2. FailedPrecondition (for non-empty directories) is raised as OSError.
  • Cache Invalidation: On successful deletion, it invalidates the cache for the deleted path and its parent to ensure the filesystem view remains consistent.
  • Testing:Tests are added to validate the new rmdir logic, covering successful deletions on HNS buckets, correct error handling for non-empty or non-existent directories, and ensuring the method properly falls back to the standard behavior for non-HNS buckets

@martindurant
Copy link
Member

Question: since HNS buckets have a true folder structure, it should be possible to ammend the directory listings cache rather than simply invalidate it when deleting, no? That could avoid unnecessary listings later on.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

@Mahalaxmibejugam
Copy link
Contributor Author

Question: since HNS buckets have a true folder structure, it should be possible to ammend the directory listings cache rather than simply invalidate it when deleting, no? That could avoid unnecessary listings later on.

@martindurant Invalidating the cache for a specific path was also invalidating the cache for all parent directories in the hierarchy. I have updated the logic to remove the cache entry only for the deleted directory and to update its immediate parent to remove the deleted directory's entry.

@Mahalaxmibejugam Mahalaxmibejugam force-pushed the hns-rmdir branch 3 times, most recently from 8a74f38 to 0e9718c Compare December 26, 2025 15:20
@ankitaluthra1
Copy link
Collaborator

/gcbrun

2 similar comments
@ankitaluthra1
Copy link
Collaborator

/gcbrun

@ankitaluthra1
Copy link
Collaborator

/gcbrun

@Mahalaxmibejugam
Copy link
Contributor Author

Question: since HNS buckets have a true folder structure, it should be possible to ammend the directory listings cache rather than simply invalidate it when deleting, no? That could avoid unnecessary listings later on.

@martindurant Invalidating the cache for a specific path was also invalidating the cache for all parent directories in the hierarchy. I have updated the logic to remove the cache entry only for the deleted directory and to update its immediate parent to remove the deleted directory's entry.

@martindurant Can you please review the latest changes following the cache update logic?

Comment on lines +593 to +596
for i, entry in enumerate(self.dircache[parent]):
if entry.get("name") == path:
self.dircache[parent].pop(i)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ought to be correct, but if some some reason the entry is duplicated in the parent, you would end up removing the wrong thing. I think a comprehension would be clearer:

Suggested change
for i, entry in enumerate(self.dircache[parent]):
if entry.get("name") == path:
self.dircache[parent].pop(i)
break
self.dircache[parent] = [ent for ent in self.dircache[parent] if ent["name"] != path]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, Initially I planned something similar to your suggestion but wanted to skip iterating over entire list if we have already removed the path from parent cache to optimize the code(might not give much performance difference though).

you would end up removing the wrong thing

Just to make sure I understand this correctly, were you referring that we will not be removing duplicate entry if it exists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants