Skip to content

More control in ManagedBlobStorage implementations #392

@mikeknep

Description

@mikeknep

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

As a client application of the library, I want to do two things related to ManagedBlobStorage (Nemotron personas datasets) that I cannot achieve today:

  1. Provide a custom DuckDB connection with a custom fsspec client registered, for reading the datasets from remote storage
    • Impossible today because DuckDBDatasetRepository always creates its own DuckDB connection. Contrast this with how implementations of the SeedReader ABC provide their own ddb connections.
  2. Opt out of the local dataset caching that happens in the _register_datasets method
    • Impossible today because load_managed_dataset_repository uses an isinstance check that always caches for any non-LocalBlobStorageProvider implementation

Describe the solution you'd like

Ideally the ManagedBlobStorage ABC would provide hooks for DuckDB connections and caching so that implementors control these things.

Describe alternatives you've considered

None available really.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions