Skip to content

roadmap: what should ipfsspec do? #7

@d70-t

Description

@d70-t

This issue is meant to discuss the purpose of the ipfsspec fsspec backend and to sharpen the overall design.

background

Due to the availability of IPFS -> HTTP gateways, a specialized IPFS backend for fsspec based read access is not required, as it is possible to open any CID using the http backend by accessing

http(s)://<gateway>/ipfs/<CID>

the downside of this approach is, that this requires to transform from content-based addressing to location-based addressing in user code. Using gateway-aware urls in user code makes it harder

  • to use local gateways
  • to do automatic fallback between multiple gateways
  • to define a preferred gateway based on the local computing environment

To overcome these downsides, it seems to be beneficial to refer to IPFS resources via a gateway-unaware url like

ipfs://<CID>

and do the translation to HTTP or IPFS when accessing the resource and based on the local computing environment and settings. This was the initial idea of ipfsspec.

design questions

Is such a library useful at all?

Or should this translation be implemented on a different layer?

Should this library do automatic load balancing / fallback between multiple gateways?

  • Doing load balancing or fallback properly is not trivial to implement (especially with async).
  • If the library should just work without user configuration, a solution with fallback is likely required, as otherwise it is not possible to use public gateways and still prefer the local gateway if is available.

Should the library provide write support?

... and if yes, how?

IPFS is a content addressable storage, thus one can not choose the filename when adding content. In stead, the "filename" is computed based on the stored content. As a result, the signature of a put function would rather look like

cid = put(content)

in stead of

put(content, filename)

and thus wouldn't directly fit into fsspec.

A way out might be to use the IPFS mutable filesystem, which adds a local mutable overlay on top of the immutable filesystem. Using MFS it would be possible to incrementally construct a local filesystem hierarchy and ask for a root CID after construction has finished. The downside of this approach is, that this only works locally (or at least local to one gateway) and thus is probably not suited for larger datasets. So there's probably not too much benefit as compared to writing data into a local temporary folder and than ipfs add -r -H the entire folder.

A related option might be to pin data blocks one by one and keep the virtual directory in memory. After writing out a larger dataset this way, a root CID for remotely stored datasets could be created. An advantage of this approach might be, that writing could be distributed to multiple remote gateways.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions