Skip to content

Scalable Web3 Storage: Gap Analysis vs TODOs vs where to continue #1

@bkontur

Description

@bkontur

Scalable Web3 Storage: Gap Analysis

tags: web3-storage substrate polkadot

Overview

This document compares the design specification against the current implementation to identify completed features, in-progress work, and remaining gaps.

Legend

Symbol Meaning
Implemented
⚠️ Partial/Simplified
Not Implemented

1. On-Chain (Pallet)

Feature Status Notes
Buckets Creation, membership, min_providers
Bucket Freezing Append-only mode via frozen_start_seq
Provider Registry Stake, multiaddr, stats
Provider Capacity max_capacity field with MinStakePerByte validation
Storage Agreements Primary + Replica types
Checkpoints MMR root with signature verification
Provider-Initiated Checkpoints 5 extrinsics: provider_checkpoint, configure_checkpoint_window, report_missed_checkpoint, claim_checkpoint_rewards, fund_checkpoint_pool
Challenge System 3 variants (checkpoint, off-chain, replica)
Replica Sync Confirmation confirm_replica_sync, top_up_replica_sync_balance
Historical Roots 6 prime-based positions for late replica sync
Weights/Benchmarks FRAME benchmarking for all 36 extrinsics (PR #6)
Cost Sharing by Response Time 90%→50% sliding scale not implemented
Burn Option Not implemented
Proof-of-DOT Phase 3 feature

Pallet Extrinsics (37 total)

Index Extrinsic Status
0-4 Provider registration & settings
10-15 Bucket management
20-28 Agreements & challenges
30-31 Client checkpoints
32-36 Provider-initiated checkpoints
40-43 Challenge responses & settlement
50-51 Replica sync management

2. Off-Chain (Provider Node)

Feature Status Notes
Upload/Download Content-addressed chunks
Commit to MMR Proper MMR implementation
Read Operations Byte-range reads
MMR Proofs Real proofs with peak tracking
Chunk Proofs Merkle proofs for challenges
Provider Signatures sr25519 signatures via SEED env var
Challenge Responder Automatic challenge detection & response
Checkpoint Coordinator Provider-initiated checkpoint coordination
Replica Sync Coordinator Full chain integration with query functions
Replica Sync Top-down MMR traversal from primaries
Deletion with Admin Proof ⚠️ Stub only
Latency-Based Selection Not implemented

Provider Node Modules

Module Purpose Status
challenge_responder.rs Detects challenges, generates proofs, submits responses
checkpoint_coordinator.rs Provider-initiated checkpoints, leader election
replica_sync_coordinator.rs Autonomous replica sync with chain queries
replica_sync.rs Top-down MMR sync from primaries
mmr.rs Merkle Mountain Range with proof generation
disk_storage.rs Persistent storage backend

Replica Sync Coordinator Chain Queries

Function Purpose Status
query_replica_agreements() Find all replica agreements for this provider
query_agreement() Query specific agreement by bucket_id
query_bucket_snapshot() Get authoritative checkpoint from chain
query_primary_endpoints() Look up primary provider multiaddrs

HTTP Endpoints

Endpoint Method Purpose
/checkpoint/sign POST Sign checkpoint proposal
/checkpoint/duty GET Query checkpoint duty status
/replica/historical_roots GET Get historical roots for bucket
/replica/sync_status GET Get replica sync status

3. Client SDK

Feature Status Notes
Upload/Commit Basic operations
Read Byte-range reads
Checkpoint Manager Multi-provider checkpoint coordination
Checkpoint Persistence State persistence with backup rotation
Event Subscription Real-time blockchain event monitoring
Provider Discovery Marketplace matching with scoring
Challenger Client Challenge submission and tracking
StorageUserClient Simplified API for storage operations
Automated Spot-Checking Not implemented
Background Sampling Not implemented
Multi-Provider Reads Single provider only

Client SDK Modules

Module Purpose
checkpoint.rs Multi-provider checkpoint coordination
checkpoint_persistence.rs State persistence, backup rotation
event_subscription.rs Real-time event monitoring
discovery.rs Provider marketplace matching (0-100 scoring)
challenger.rs Challenge creation and tracking
storage_user.rs High-level storage user client
admin.rs Bucket administration operations

4. Layer 1: File System Interface

Feature Status Notes
File System Primitives DriveInfo, DirectoryNode, FileManifest, CommitStrategy
Drive Registry Pallet On-chain drive management, root CID tracking
File System Client High-level file/folder interface with subxt
Commit Strategies Immediate, Batched, Manual
Content Addressing CID computation and verification
Drive Cleanup clear_drive, delete_drive with refunds

File System Structure

storage-interfaces/file-system/
├── primitives/        # Core types (no_std compatible)
├── pallet-registry/   # On-chain drive management (19 tests)
├── client/            # High-level SDK with blockchain integration
│   └── examples/      # basic_usage.rs workflow
└── examples/          # Integration examples

5. Test Coverage

Package Tests Status
pallet-storage-provider 34 ✅ All passing
storage-provider-node 22 ✅ All passing
storage-client 45 ✅ All passing
pallet-drive-registry 19 ✅ All passing
file-system-primitives 5 ✅ All passing

Total: 125 tests passing


6. Documentation

Document Status
CHECKPOINT_PROTOCOL.md ✅ Automated checkpoint protocol
provider-initiated-checkpoints.md ✅ Leader election design
marketplace.md ✅ Provider capacity & discovery
EXECUTION_FLOWS.md ✅ Sequence diagrams
BENCHMARKING.md ✅ Weight generation guide
FILE_SYSTEM_QUICKSTART.md ✅ Layer 1 quick start
docs/filesystems/ ✅ User Guide, Admin Guide, API Reference, Architecture

7. Design Phases vs Implementation

Phase Description Status
Phase 1 Buckets and Basic Storage ✅ 95% - core working
Phase 2 Challenges and Guarantees ⚠️ 70% - cost scaling missing
Phase 3 Proof-of-DOT ❌ 0% - not started
Phase 4 Third-Party Providers & Replicas ✅ 85% - full chain integration

8. Work Remaining

Critical for Production

Task Priority Status
Benchmark all extrinsics 🔴 Critical ✅ PR #6
Challenge cost scaling 🔴 Critical ❌ Not implemented
Security audit 🔴 Critical ❌ Not done

High Priority

Task Status
Burn option for poor service
Full integration test suite ⚠️ Partial
Admin deletion signatures ⚠️ Stub only

Medium Priority

Task Status
Proof-of-DOT identity layer
Multi-provider reads with failover
Automated spot-checking

9. Quick Start

# Setup (one-time)
just setup

# Start infrastructure
just start-chain      # Terminal 1
just start-provider   # Terminal 2

# Run demo
just demo             # Full workflow

# Layer 1 File System
just fs-integration-test

10. Conclusion

Production Readiness

⚠️ TESTNET READY with caveats

Completed:

  • ✅ Core storage functionality (upload, download, commit)
  • ✅ Signature verification with sr25519
  • ✅ Provider coordination (checkpoints, challenges)
  • ✅ Client SDK with checkpoint management
  • ✅ Event subscription system
  • ✅ Layer 1 File System interface
  • ✅ Replica sync coordinator with full chain integration
  • ✅ FRAME benchmarking for all 36 extrinsics
  • ✅ 125 tests passing

Required for Production:

  • ❌ Challenge cost-sharing sliding scale
  • ❌ Burn option
  • ❌ Security audit

Last updated: February 2026

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions