Data Fabrication is a WASM evaluation module for generating and validating AI training datasets on the Bittensor network. It runs inside platform validators to evaluate miner submissions that produce conversation datasets. Miners submit Python harnesses that generate synthetic conversations, and the network scores them through a multi-stage pipeline including AST structural similarity checks and LLM-based plagiarism detection.
# Build WASM module
cargo build --target wasm32-unknown-unknown -p data-fabrication-wasm
# Build CLI tool
cargo build --release -p df-cli
# Run the TUI monitor
df-cli monitorflowchart LR
Miner[Miner] -->|Submit Python Harness| RPC[Validator RPC]
RPC --> Validators[Validator Network]
Validators --> WASM[data-fabrication WASM]
WASM --> Storage[(Blockchain Storage)]
Validators --> Executor[df-executor]
Executor -->|Dataset Results| Validators
Validators -->|Scores + Weights| BT[Bittensor Chain]
CLI[df-cli TUI] -->|JSON-RPC| RPC
CLI -->|Display| Monitor[Progress / Logs / Leaderboard]
sequenceDiagram
participant M as Miner
participant V as Validators
participant W as WASM Module
participant E as df-executor
participant BT as Bittensor
M->>V: Submit Python harness (JSON)
V->>W: Store code, validate format
W-->>V: Validation pass/fail
V->>W: Run AST similarity check
W-->>V: Similarity score
V->>E: Execute harness, generate dataset
E-->>V: Generated conversations
V->>W: LLM plagiarism evaluation
W-->>V: Plagiarism verdict
V->>W: Compute final score
V->>BT: Submit weights at epoch boundary
flowchart TB
Code[Python Harness] --> Parse[Parse AST]
Parse --> Normalize[Normalize Variables]
Normalize --> Hash[Structure Hash]
Hash --> Compare[Compare with Others]
Compare --> LCS[LCS Algorithm]
LCS --> Score[Similarity Score 0-100]
Score --> Status{Plagiarism Status}
Status -->|>= 97%| Plagiarized[Plagiarized]
Status -->|30-96%| NeedsLLM[NeedsLlmVerification]
Status -->|< 30%| Clean[Clean]
- WASM Module: Compiles to
wasm32-unknown-unknown, loaded by platform validators - AST Structural Similarity: Normalizes Python code and compares structure via LCS algorithm
- LLM Plagiarism Detection: Retry-enabled LLM inference for semantic comparison
- Submission Validation: Size limits, format checks, and signature verification
- Conversation Dataset Generation: Python harnesses produce JSONL conversation datasets
- Resource Limits: CPU time, memory, and file size constraints for sandboxed execution
- Plagiarism Clustering: Groups similar submissions by structure hash prefix
- CLI (df-cli): Native TUI for monitoring evaluations and network status
# Via Platform CLI (recommended)
platform download data-fabrication
# Or build from source
git clone https://github.com/PlatformNetwork/data-fabrication
cd data-fabrication
cargo build --release# Build WASM module (for platform validators)
cargo build --target wasm32-unknown-unknown -p data-fabrication-wasm
# The output .wasm file is at:
# target/wasm32-unknown-unknown/release/data_fabrication_wasm.wasm
# Build CLI (native)
cargo build --release -p df-cli
# Build executor
cargo build --release -p df-executor
# Build all workspace members
cargo build --release --workspace# Launch interactive TUI (connects to https://chain.platform.network)
df-cli monitor
# Submit a Python harness
df-cli submit --harness ./my-harness/
# Check submission status
df-cli status --hotkey 5Abc...
# Monitor a specific miner
df-cli --hotkey 5GrwvaEF... monitor
# Custom RPC endpoint
df-cli --rpc-url http://localhost:8080 monitorSubcommands: submit Β· status Β· monitor (default)
TUI Controls: Tab/Shift+Tab switch tabs Β· β/β scroll Β· r refresh Β· q quit
data-fabrication/
βββ wasm/ # WASM evaluation module (compiled to wasm32-unknown-unknown)
β βββ src/
β βββ lib.rs # Challenge trait implementation
β βββ types.rs # Submission and config types
βββ core/ # Shared types (no_std compatible)
β βββ src/
β βββ lib.rs # Domain types (HarnessSubmission, GeneratedDataset)
β βββ ast_similarity.rs # AST normalization, structure hashing, LCS comparison
β βββ ast_validation.rs # Python code security validation
β βββ scoring_types.rs # Score types (ConversationScore, DatasetScore)
β βββ schema.rs # JSONL parsing for conversation datasets
β βββ consensus.rs # Multi-validator consensus
β βββ cache.rs # Evaluation result caching
β βββ resource_limits.rs # CPU, memory, file constraints
βββ executor/ # Native execution engine
β βββ src/
β βββ lib.rs # Executor entry point
β βββ llm_inference.rs # LLM client with retry logic
βββ cli/ # Native TUI monitoring tool
β βββ src/
β βββ main.rs # Entry point, event loop
β βββ app.rs # Application state
β βββ ui.rs # Ratatui UI rendering
β βββ rpc.rs # JSON-RPC 2.0 client
βββ server/ # Native HTTP server
β βββ src/
β βββ main.rs # HTTP evaluation server
βββ src/ # Root crate library
βββ lib.rs # HuggingFace dataset handler
- Miners submit Python harness code via
df-cli submit - Platform validators load this WASM module
- WASM validates submission format, size limits, and signature
- Executor runs the Python harness in a sandboxed environment
- Generated conversations are parsed from JSONL output
- AST similarity compares submission structure against others using normalized variables and LCS algorithm
- LLM inference performs semantic plagiarism detection with retry logic
- Final score combines dataset quality and originality metrics
- Validators submit weights to Bittensor at epoch boundaries
The plagiarism detection uses a two-pass approach:
# Original code
x = 1
y = x + 2
# After normalization (both produce identical AST)
a = 1
b = a + 2Variables are normalized to var_0, var_1, etc., and the AST structure is hashed. Submissions with matching structure hashes are clustered for detailed comparison.
For submissions above a similarity threshold, an LLM evaluates:
- Logic flow patterns
- Naming convention similarities
- Comment and docstring patterns
- Architecture Overview β System components, host functions, storage schema
- Miner Quickstart β How to build and submit harnesses
- Executor Setup β Deploy your evaluation node
- Evaluation Pipeline β Scoring and plagiarism detection
- API Reference β Public and authenticated endpoints
- Validator Setup β Hardware requirements and configuration
Apache-2.0