Scalable Web3 Storage: Gap Analysis
tags: web3-storage substrate polkadot
Overview
This document compares the design specification against the current implementation to identify completed features, in-progress work, and remaining gaps.
Legend
| Symbol |
Meaning |
| ✅ |
Implemented |
| ⚠️ |
Partial/Simplified |
| ❌ |
Not Implemented |
1. On-Chain (Pallet)
| Feature |
Status |
Notes |
| Buckets |
✅ |
Creation, membership, min_providers |
| Bucket Freezing |
✅ |
Append-only mode via frozen_start_seq |
| Provider Registry |
✅ |
Stake, multiaddr, stats |
| Provider Capacity |
✅ |
max_capacity field with MinStakePerByte validation |
| Storage Agreements |
✅ |
Primary + Replica types |
| Checkpoints |
✅ |
MMR root with signature verification |
| Provider-Initiated Checkpoints |
✅ |
5 extrinsics: provider_checkpoint, configure_checkpoint_window, report_missed_checkpoint, claim_checkpoint_rewards, fund_checkpoint_pool |
| Challenge System |
✅ |
3 variants (checkpoint, off-chain, replica) |
| Replica Sync Confirmation |
✅ |
confirm_replica_sync, top_up_replica_sync_balance |
| Historical Roots |
✅ |
6 prime-based positions for late replica sync |
| Weights/Benchmarks |
✅ |
FRAME benchmarking for all 36 extrinsics (PR #6) |
| Cost Sharing by Response Time |
❌ |
90%→50% sliding scale not implemented |
| Burn Option |
❌ |
Not implemented |
| Proof-of-DOT |
❌ |
Phase 3 feature |
Pallet Extrinsics (37 total)
| Index |
Extrinsic |
Status |
| 0-4 |
Provider registration & settings |
✅ |
| 10-15 |
Bucket management |
✅ |
| 20-28 |
Agreements & challenges |
✅ |
| 30-31 |
Client checkpoints |
✅ |
| 32-36 |
Provider-initiated checkpoints |
✅ |
| 40-43 |
Challenge responses & settlement |
✅ |
| 50-51 |
Replica sync management |
✅ |
2. Off-Chain (Provider Node)
| Feature |
Status |
Notes |
| Upload/Download |
✅ |
Content-addressed chunks |
| Commit to MMR |
✅ |
Proper MMR implementation |
| Read Operations |
✅ |
Byte-range reads |
| MMR Proofs |
✅ |
Real proofs with peak tracking |
| Chunk Proofs |
✅ |
Merkle proofs for challenges |
| Provider Signatures |
✅ |
sr25519 signatures via SEED env var |
| Challenge Responder |
✅ |
Automatic challenge detection & response |
| Checkpoint Coordinator |
✅ |
Provider-initiated checkpoint coordination |
| Replica Sync Coordinator |
✅ |
Full chain integration with query functions |
| Replica Sync |
✅ |
Top-down MMR traversal from primaries |
| Deletion with Admin Proof |
⚠️ |
Stub only |
| Latency-Based Selection |
❌ |
Not implemented |
Provider Node Modules
| Module |
Purpose |
Status |
challenge_responder.rs |
Detects challenges, generates proofs, submits responses |
✅ |
checkpoint_coordinator.rs |
Provider-initiated checkpoints, leader election |
✅ |
replica_sync_coordinator.rs |
Autonomous replica sync with chain queries |
✅ |
replica_sync.rs |
Top-down MMR sync from primaries |
✅ |
mmr.rs |
Merkle Mountain Range with proof generation |
✅ |
disk_storage.rs |
Persistent storage backend |
✅ |
Replica Sync Coordinator Chain Queries
| Function |
Purpose |
Status |
query_replica_agreements() |
Find all replica agreements for this provider |
✅ |
query_agreement() |
Query specific agreement by bucket_id |
✅ |
query_bucket_snapshot() |
Get authoritative checkpoint from chain |
✅ |
query_primary_endpoints() |
Look up primary provider multiaddrs |
✅ |
HTTP Endpoints
| Endpoint |
Method |
Purpose |
/checkpoint/sign |
POST |
Sign checkpoint proposal |
/checkpoint/duty |
GET |
Query checkpoint duty status |
/replica/historical_roots |
GET |
Get historical roots for bucket |
/replica/sync_status |
GET |
Get replica sync status |
3. Client SDK
| Feature |
Status |
Notes |
| Upload/Commit |
✅ |
Basic operations |
| Read |
✅ |
Byte-range reads |
| Checkpoint Manager |
✅ |
Multi-provider checkpoint coordination |
| Checkpoint Persistence |
✅ |
State persistence with backup rotation |
| Event Subscription |
✅ |
Real-time blockchain event monitoring |
| Provider Discovery |
✅ |
Marketplace matching with scoring |
| Challenger Client |
✅ |
Challenge submission and tracking |
| StorageUserClient |
✅ |
Simplified API for storage operations |
| Automated Spot-Checking |
❌ |
Not implemented |
| Background Sampling |
❌ |
Not implemented |
| Multi-Provider Reads |
❌ |
Single provider only |
Client SDK Modules
| Module |
Purpose |
checkpoint.rs |
Multi-provider checkpoint coordination |
checkpoint_persistence.rs |
State persistence, backup rotation |
event_subscription.rs |
Real-time event monitoring |
discovery.rs |
Provider marketplace matching (0-100 scoring) |
challenger.rs |
Challenge creation and tracking |
storage_user.rs |
High-level storage user client |
admin.rs |
Bucket administration operations |
4. Layer 1: File System Interface
| Feature |
Status |
Notes |
| File System Primitives |
✅ |
DriveInfo, DirectoryNode, FileManifest, CommitStrategy |
| Drive Registry Pallet |
✅ |
On-chain drive management, root CID tracking |
| File System Client |
✅ |
High-level file/folder interface with subxt |
| Commit Strategies |
✅ |
Immediate, Batched, Manual |
| Content Addressing |
✅ |
CID computation and verification |
| Drive Cleanup |
✅ |
clear_drive, delete_drive with refunds |
File System Structure
storage-interfaces/file-system/
├── primitives/ # Core types (no_std compatible)
├── pallet-registry/ # On-chain drive management (19 tests)
├── client/ # High-level SDK with blockchain integration
│ └── examples/ # basic_usage.rs workflow
└── examples/ # Integration examples
5. Test Coverage
| Package |
Tests |
Status |
pallet-storage-provider |
34 |
✅ All passing |
storage-provider-node |
22 |
✅ All passing |
storage-client |
45 |
✅ All passing |
pallet-drive-registry |
19 |
✅ All passing |
file-system-primitives |
5 |
✅ All passing |
Total: 125 tests passing
6. Documentation
| Document |
Status |
CHECKPOINT_PROTOCOL.md |
✅ Automated checkpoint protocol |
provider-initiated-checkpoints.md |
✅ Leader election design |
marketplace.md |
✅ Provider capacity & discovery |
EXECUTION_FLOWS.md |
✅ Sequence diagrams |
BENCHMARKING.md |
✅ Weight generation guide |
FILE_SYSTEM_QUICKSTART.md |
✅ Layer 1 quick start |
docs/filesystems/ |
✅ User Guide, Admin Guide, API Reference, Architecture |
7. Design Phases vs Implementation
| Phase |
Description |
Status |
| Phase 1 |
Buckets and Basic Storage |
✅ 95% - core working |
| Phase 2 |
Challenges and Guarantees |
⚠️ 70% - cost scaling missing |
| Phase 3 |
Proof-of-DOT |
❌ 0% - not started |
| Phase 4 |
Third-Party Providers & Replicas |
✅ 85% - full chain integration |
8. Work Remaining
Critical for Production
| Task |
Priority |
Status |
| Benchmark all extrinsics |
🔴 Critical |
✅ PR #6 |
| Challenge cost scaling |
🔴 Critical |
❌ Not implemented |
| Security audit |
🔴 Critical |
❌ Not done |
High Priority
| Task |
Status |
| Burn option for poor service |
❌ |
| Full integration test suite |
⚠️ Partial |
| Admin deletion signatures |
⚠️ Stub only |
Medium Priority
| Task |
Status |
| Proof-of-DOT identity layer |
❌ |
| Multi-provider reads with failover |
❌ |
| Automated spot-checking |
❌ |
9. Quick Start
# Setup (one-time)
just setup
# Start infrastructure
just start-chain # Terminal 1
just start-provider # Terminal 2
# Run demo
just demo # Full workflow
# Layer 1 File System
just fs-integration-test
10. Conclusion
Production Readiness
⚠️ TESTNET READY with caveats
Completed:
- ✅ Core storage functionality (upload, download, commit)
- ✅ Signature verification with sr25519
- ✅ Provider coordination (checkpoints, challenges)
- ✅ Client SDK with checkpoint management
- ✅ Event subscription system
- ✅ Layer 1 File System interface
- ✅ Replica sync coordinator with full chain integration
- ✅ FRAME benchmarking for all 36 extrinsics
- ✅ 125 tests passing
Required for Production:
- ❌ Challenge cost-sharing sliding scale
- ❌ Burn option
- ❌ Security audit
Last updated: February 2026