Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
As a client application of the library, I want to do two things related to ManagedBlobStorage (Nemotron personas datasets) that I cannot achieve today:
- Provide a custom DuckDB connection with a custom fsspec client registered, for reading the datasets from remote storage
- Impossible today because
DuckDBDatasetRepository always creates its own DuckDB connection. Contrast this with how implementations of the SeedReader ABC provide their own ddb connections.
- Opt out of the local dataset caching that happens in the _register_datasets method
- Impossible today because
load_managed_dataset_repository uses an isinstance check that always caches for any non-LocalBlobStorageProvider implementation
Describe the solution you'd like
Ideally the ManagedBlobStorage ABC would provide hooks for DuckDB connections and caching so that implementors control these things.
Describe alternatives you've considered
None available really.
Additional context
No response