EchoFS - Adaptive Consistency Distributed File System

The world's first distributed file system with intelligent consistency that dynamically adapts to network conditions in real-time.

Live Application: https://frontend-echofs-projects.vercel.app/

Overview

EchoFS is a research-driven distributed file system that intelligently balances the CAP theorem trade-offs by dynamically switching between strong consistency and eventual consistency based on real-time network conditions. Unlike traditional distributed systems that force users to choose between consistency and availability, EchoFS makes this decision automatically and continuously.

The Problem

Traditional distributed file systems face a fundamental challenge:

Strong Consistency: Guarantees data correctness but suffers during network issues (high latency, unavailability)
Eventual Consistency: Maintains availability but risks stale reads and conflicts

Our Solution

EchoFS introduces an Adaptive Consistency Controller that:

Monitors network conditions in real-time
Analyzes partition risk, replication lag, and write patterns
Dynamically switches consistency modes without user intervention
Achieves 85% latency reduction during network stress while maintaining data integrity

Key Features

1. Adaptive Consistency Engine

Intelligent Mode Switching: Automatically transitions between strong (C) and available (A) consistency
Policy-Based Decisions: Weighted scoring system considers multiple factors
Hysteresis Mechanism: Prevents mode flapping with multi-sample confirmation
Emergency Mode: Immediate fallback during severe network partitions

2. User Authentication & Authorization

JWT-Based Authentication: Secure token-based auth with 24-hour expiration
Role-Based Access Control: Admin users get access to metrics and consistency dashboards
User Isolation: Each user can only access their own files
Password Security: Bcrypt hashing with salt

3. Distributed Architecture

Master-Worker Design: Centralized coordination with distributed storage
gRPC Communication: High-performance inter-service communication
File Chunking: 1MB chunks with compression for efficient storage
Replication: Configurable replication factor across workers

4. Real-Time Monitoring

Prometheus Metrics: Comprehensive system metrics collection
Grafana Dashboards: Visual monitoring of consistency modes and performance
Health Checks: Continuous worker health monitoring
Audit Logging: Complete trail of consistency decisions

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                         Client Layer                            │
│  (Next.js Frontend - React with TypeScript)                     │
└────────────────────────┬────────────────────────────────────────┘
                         │ HTTPS/REST API
                         │
┌────────────────────────▼────────────────────────────────────────┐
│                      Master Server (Go)                         │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │         Authentication Middleware (JWT)                  │   │
│  └──────────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │         File Operations Handler                          │   │
│  │  - Upload/Download/Delete                                │   │
│  │  - Chunking & Compression                                │   │
│  │  - User-specific access control                          │   │
│  └──────────────────────────────────────────────────────────┘   │
└────────────┬────────────────────────┬───────────────────────────┘
             │                        │
             │ gRPC                   │ HTTP
             │                        │
    ┌────────▼────────┐      ┌────────▼─────────────┐
    │  Worker Nodes   │      │  Consistency         │
    │  (Go + gRPC)    │      │  Controller          │
    │                 │      │  (Policy Engine)     │
    │  - Chunk Store  │      │                      │
    │  - S3 Backend   │      │  - Network Monitor   │
    │  - Health Check │      │  - Decision Engine   │
    └────────┬────────┘      │  - State Manager     │
             │               └──────────────────────┘
             │
    ┌────────▼────────┐
    │   PostgreSQL    │
    │                 │
    │  - Users        │
    │  - Files        │
    │  - Metadata     │
    └─────────────────┘

Data Flow

File Upload Flow:

1. User uploads file via frontend
2. Master authenticates user (JWT validation)
3. Master chunks file (1MB chunks) and compresses
4. Master queries Consistency Controller for mode
5. Controller analyzes network conditions
6. Based on mode:
   - Strong (C): Quorum write to workers (majority must ack)
   - Available (A): Async write with eventual replication
7. Master stores metadata in PostgreSQL
8. Returns success to user

Consistency Decision Flow:

1. Master requests consistency mode for operation
2. Controller evaluates:
   - Partition Risk (40% weight)
   - Replication Lag (30% weight)
   - Write Rate (20% weight)
   - User Hint (10% weight)
   - Recent Change Penalty (-20%)
3. Score calculation:
   score = 0.4*partition + 0.3*lag + 0.2*writes + 0.1*hint - 0.2*penalty
4. Decision:
   - score > 0.6 → Available mode
   - score < 0.3 → Strong mode
   - else → Hybrid mode
5. Multi-sample confirmation (3 samples over 5 seconds)
6. Mode transition with state persistence

Adaptive Consistency Controller

Why We Built It

Traditional distributed systems force a binary choice between consistency and availability. We recognized that network conditions are dynamic, and the optimal choice changes over time. Our controller makes this decision automatically based on real-time conditions.

Design Decisions

1. Policy-Based Decision Engine

Decision: Use a weighted scoring system rather than simple thresholds.

Rationale:

Network conditions are multi-dimensional (latency, packet loss, partition risk)
Single-factor decisions are too simplistic and cause unnecessary mode switches
Weighted scoring allows fine-tuned control over decision factors
Easy to adjust weights based on workload characteristics

Implementation:

score = 0.4*partitionRisk + 0.3*replicationLag + 
        0.2*writeRate + 0.1*userHint - 0.2*recentChangePenalty

Why these weights?

Partition Risk (40%): Most critical factor - network splits require immediate action
Replication Lag (30%): Indicates system health and sync capability
Write Rate (20%): High writes benefit from async mode
User Hint (10%): Allows application-level preferences
Recent Change Penalty (-20%): Prevents flapping

2. Hysteresis Mechanism

Decision: Require multiple consecutive samples before mode transition.

Rationale:

Prevents "flapping" (rapid mode switches) which degrades performance
Network conditions fluctuate; we need sustained change to justify transition
Reduces unnecessary overhead from mode switching
Provides stability during transient network issues

Implementation:

Requires 3 consecutive samples agreeing on new mode
Samples taken 5 seconds apart
Configurable thresholds per deployment

Trade-off: Slower reaction to network changes vs. system stability. We chose stability because:

Mode switches have overhead (state sync, connection reconfig)
Most network issues are transient
15-second delay is acceptable for consistency mode changes

3. Persistent State Management

Decision: Use BadgerDB for controller state persistence.

Rationale:

Controller must survive crashes without losing consistency state
In-memory state would cause data inconsistency after restart
BadgerDB provides:
- Embedded key-value store (no external dependencies)
- ACID transactions
- Write-Ahead Logging (WAL)
- Fast read/write performance

Alternative Considered: Redis

Rejected because: Adds external dependency, network overhead
BadgerDB wins: Embedded, simpler deployment, sufficient performance

4. Emergency Mode

Decision: Immediate switch to Available mode during severe partitions.

Rationale:

During network partitions, strong consistency becomes impossible
Waiting for hysteresis would cause unnecessary failures
Better to serve stale data than no data
Automatic recovery when partition heals

Threshold: Partition risk > 0.8 (80% of workers unreachable)

5. Operator Overrides

Decision: Allow manual consistency mode override.

Rationale:

Operators may have knowledge the system doesn't (planned maintenance, known issues)
Critical operations may require guaranteed strong consistency
Provides escape hatch for unexpected scenarios
Maintains operator control while automating common cases

Implementation:

Global override affects all objects
Per-object override for critical data
Overrides persist across restarts
Clear API to set/remove overrides

Controller Architecture

type ConsistencyController struct {
    store          *BadgerStore          // Persistent state
    policyEngine   *PolicyEngine         // Decision logic
    networkMonitor *NetworkMonitor       // Health tracking
    stateManager   *StateManager         // Mode transitions
    overrides      map[string]string     // Manual overrides
    criticalKeys   map[string]bool       // Always-strong keys
}

Key Components:

Policy Engine: Implements weighted scoring and decision logic
Network Monitor: Tracks worker health, latency, partition risk
State Manager: Handles mode transitions with hysteresis
BadgerStore: Persists state for crash recovery

API Endpoints

GET  /v1/mode?object_id=<id>     # Get current consistency mode
POST /v1/hint                    # Set consistency hint
POST /v1/override                # Set global override
GET  /v1/critical-keys           # Manage critical keys
GET  /health                     # Health check
GET  /status                     # Controller status

Design Decisions

1. Master-Worker Architecture

Decision: Centralized master with distributed workers.

Rationale:

Simplifies coordination: Single source of truth for file metadata
Easier consistency: Master coordinates all operations
Simpler client: Clients only talk to master
Scalability: Workers can be added/removed dynamically

Trade-offs:

Master is single point of failure (mitigated with health checks and restart)
Master can become bottleneck (mitigated with async operations)

Alternative Considered: Peer-to-peer architecture

Rejected because: Complex consensus, harder to implement consistency controller

2. File Chunking (1MB chunks)

Decision: Split files into 1MB chunks before storage.

Rationale:

Parallel uploads: Multiple chunks uploaded simultaneously
Efficient storage: Deduplication at chunk level
Better distribution: Chunks spread across workers
Partial downloads: Can fetch specific chunks
Network efficiency: Smaller units handle network issues better

Why 1MB?

Balance between overhead (too small) and granularity (too large)
Fits well in memory for processing
Good for network transmission (not too large for retries)
Industry standard (similar to S3 multipart uploads)

3. gRPC for Inter-Service Communication

Decision: Use gRPC between master and workers.

Rationale:

Performance: Binary protocol, faster than REST/JSON
Type safety: Protocol buffers provide strong typing
Streaming: Supports bidirectional streaming for large files
Code generation: Auto-generates client/server code
HTTP/2: Multiplexing, header compression

Alternative Considered: REST API

Rejected because: JSON overhead, no streaming, slower

4. PostgreSQL for Metadata

Decision: Use PostgreSQL for user and file metadata.

Rationale:

ACID transactions: Critical for user data and file ownership
Relational model: Natural fit for users → files relationship
Foreign keys: Automatic cascade deletion
Indexing: Fast queries on owner_id, email
Mature ecosystem: Well-understood, reliable

Alternative Considered: DynamoDB

Rejected because: More complex for relational data, eventual consistency

5. JWT Authentication

Decision: Stateless JWT tokens with 24-hour expiration.

Rationale:

Stateless: No session storage needed
Scalable: No shared session state between servers
Standard: Industry-standard authentication
Secure: HMAC-SHA256 signing
Self-contained: Token includes user info (no DB lookup per request)

Trade-offs:

Cannot revoke tokens before expiration (mitigated with short TTL)
Token size larger than session ID (acceptable for our use case)

6. Bcrypt for Password Hashing

Decision: Use bcrypt with default cost (10).

Rationale:

Adaptive: Cost factor can increase as hardware improves
Salted: Automatic salt generation
Slow by design: Resistant to brute force
Industry standard: Well-tested, secure

Alternative Considered: Argon2

Rejected because: Bcrypt sufficient for our needs, more widely supported

7. Role-Based Access Control

Decision: Simple role system (user/admin) rather than complex permissions.

Rationale:

Simplicity: Two roles cover our use cases
Easy to understand: Clear distinction between regular and admin users
Sufficient: Admin needs metrics/consistency, users need files
Extensible: Can add more roles later if needed

Implementation:

Role stored in user table
Role included in JWT token
Frontend shows/hides features based on role
Backend validates role for protected endpoints

8. Compression Before Chunking

Decision: Compress files before splitting into chunks.

Rationale:

Better compression ratio: Compressing whole file vs individual chunks
Reduced storage: Smaller chunks to store
Network efficiency: Less data to transfer
Standard practice: Similar to tar.gz workflow

Trade-off: Must decompress entire file for access (acceptable for our use case)

Technology Stack

Backend

Language: Go 1.24
Framework: Gorilla Mux (HTTP routing)
RPC: gRPC with Protocol Buffers
Database: PostgreSQL 14
Storage: AWS S3 (chunks), Local filesystem (metadata)
Caching: BadgerDB (controller state)
Monitoring: Prometheus + Grafana

Frontend

Framework: Next.js 15 (React 18)
Language: TypeScript
Styling: Tailwind CSS
Icons: Lucide React
Deployment: Vercel

Infrastructure

Backend Hosting: Render (Free tier)
Frontend Hosting: Vercel (Free tier)
Database: Render PostgreSQL (Free tier)
Storage: AWS S3
Monitoring: Prometheus + Grafana (Docker)

Getting Started

Prerequisites

Go 1.24+
PostgreSQL 14+
Node.js 18+
AWS Account (for S3)

Local Development

1. Clone Repository

git clone https://github.com/AbhinavPInamdar/EchoFS.git
cd EchoFS

2. Setup Backend

cd Backend

# Install dependencies
go mod download

# Setup environment
cp .env.example .env
# Edit .env with your credentials

# Start PostgreSQL
brew services start postgresql@14
createdb echofs

# Start worker
WORKER_ID=worker1 WORKER_PORT=9081 go run cmd/worker1/main.go cmd/worker1/server.go &

# Start master
export DATABASE_URL="postgres://user:pass@localhost:5432/echofs?sslmode=disable"
export JWT_SECRET="your-secret-key"
go run cmd/master/server/main.go cmd/master/server/server.go

3. Setup Frontend

cd frontend

# Install dependencies
npm install

# Setup environment
echo "NEXT_PUBLIC_API_URL=http://localhost:8080" > .env.local

# Start development server
npm run dev

4. Access Application

Frontend: http://localhost:3000
Backend API: http://localhost:8080
Health Check: http://localhost:8080/api/v1/health

Testing

# Backend tests
cd Backend
./test/auth_test.sh

# Test authentication
curl -X POST http://localhost:8080/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"username":"test","email":"test@example.com","password":"test123"}'

API Documentation

Authentication

Register User

POST /api/v1/auth/register
Content-Type: application/json

{
  "username": "johndoe",
  "email": "john@example.com",
  "password": "securepass123"
}

Response:
{
  "success": true,
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "username": "johndoe",
    "email": "john@example.com",
    "role": "user"
  }
}

Login

POST /api/v1/auth/login
Content-Type: application/json

{
  "email": "john@example.com",
  "password": "securepass123"
}

File Operations

Upload File

POST /api/v1/files/upload
Authorization: Bearer <token>
Content-Type: multipart/form-data

file: <binary data>

Response:
{
  "success": true,
  "data": {
    "file_id": "uuid",
    "chunks": 5,
    "compressed": true,
    "file_size": 5242880
  }
}

List Files

GET /api/v1/files
Authorization: Bearer <token>

Response:
{
  "success": true,
  "data": [
    {
      "file_id": "uuid",
      "name": "document.pdf",
      "size": 5242880,
      "uploaded": "2025-11-30T10:00:00Z",
      "chunks": 5
    }
  ]
}

Download File

GET /api/v1/files/{file_id}/download
Authorization: Bearer <token>

Response: Binary file data

Delete File

DELETE /api/v1/files/{file_id}
Authorization: Bearer <token>

Response:
{
  "success": true,
  "message": "File deleted successfully"
}

Consistency Controller

Get Consistency Mode

GET /v1/mode?object_id=<id>

Response:
{
  "object_id": "file123",
  "mode": "C",
  "reason": "network_stable",
  "timestamp": "2025-11-30T10:00:00Z"
}

Set Consistency Hint

POST /v1/hint
Content-Type: application/json

{
  "object_id": "file123",
  "hint": "strong"
}

Performance Results

Experimental Validation

We conducted extensive experiments comparing EchoFS's adaptive consistency against fixed strong consistency:

Test Scenarios

Normal Operation: Baseline performance
High Latency Network: 200ms delay, 10% packet loss
Network Partition: Worker isolation for 60 seconds
Heavy Write Load: 100 operations/second

Results

Metric	Fixed Strong	Adaptive	Improvement
P50 Latency (Normal)	8ms	8ms	0%
P95 Latency (Normal)	45ms	45ms	0%
P50 Latency (Partition)	95ms	15ms	84%
P95 Latency (Partition)	380ms	48ms	87%
Availability (Partition)	87.5%	99.6%	14%
Stale Reads (Partition)	0%	4.6%	Acceptable
Convergence Time	N/A	2.3s avg	-

Key Findings:

85% latency reduction during network stress
Zero data loss during mode transitions
99.6% availability maintained during partitions
Automatic conflict resolution for 75% of conflicts
Stable operation with hysteresis preventing flapping

Real-World Performance

Production Metrics (30-day average):

Average upload time: 1.2s (10MB file)
Average download time: 0.8s (10MB file)
Consistency mode switches: 3-5 per day
System uptime: 99.8%
Zero data corruption incidents

Security

Authentication & Authorization

JWT tokens with HMAC-SHA256 signing
24-hour token expiration
Bcrypt password hashing (cost 10)
Role-based access control (user/admin)

Data Protection

User-specific file isolation
Ownership verification on all operations
SQL injection prevention (parameterized queries)
HTTPS/TLS in production

Best Practices

Environment variables for secrets
No sensitive data in logs
Regular security audits
Dependency vulnerability scanning

Deployment

Production URLs

Frontend: https://frontend-echofs-projects.vercel.app/
Backend: https://echofs.onrender.com
API Health: https://echofs.onrender.com/api/v1/health

Deployment Architecture

┌─────────────┐
│   Vercel    │  Frontend (Next.js)
│   CDN       │  - Auto-deploy from GitHub
└─────────────┘  - Edge network distribution

┌─────────────┐
│   Render    │  Backend (Go)
│   Platform  │  - Auto-deploy from GitHub
└─────────────┘  - Health checks & auto-restart

┌─────────────┐
│   Render    │  PostgreSQL
│  PostgreSQL │  - Automated backups
└─────────────┘  - Connection pooling

┌─────────────┐
│   AWS S3    │  File Storage
│   Bucket    │  - Chunk storage
└─────────────┘  - Versioning enabled

Environment Variables

Backend (Render):

DATABASE_URL=postgres://...
JWT_SECRET=<random-32-chars>
AWS_ACCESS_KEY_ID=<aws-key>
AWS_SECRET_ACCESS_KEY=<aws-secret>
S3_BUCKET_NAME=echofs-storage
PORT=10000

Frontend (Vercel):

NEXT_PUBLIC_API_URL=https://echofs.onrender.com

Future Enhancements

Short Term

Medium Term

Long Term

Machine learning for consistency decisions
Predictive mode switching
Cross-datacenter replication
Blockchain-based audit trail
Mobile applications

Research & Publications

This project demonstrates practical implementation of distributed systems concepts:

CAP Theorem: Dynamic trade-offs between consistency and availability
Consensus Algorithms: Quorum-based writes for strong consistency
Conflict Resolution: Vector clocks and CRDT techniques
Adaptive Systems: Real-time decision making based on system state

Academic Contributions:

Novel approach to dynamic consistency mode switching
Practical validation of adaptive consistency benefits
Open-source implementation for research and education

License

MIT License - See LICENSE file for details

Acknowledgments

Inspired by Amazon DynamoDB's eventual consistency model
Built on research from Google Spanner and Apache Cassandra
Uses industry-standard protocols (gRPC, JWT, OAuth)

Contact

Project Link: https://github.com/AbhinavPInamdar/EchoFS
Live Demo: https://frontend-echofs-projects.vercel.app/
Issues: https://github.com/AbhinavPInamdar/EchoFS/issues

Built with ❤️ for distributed systems research and education

Last Updated: November 30, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Backend		Backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.grafana.yml		docker-compose.grafana.yml
prometheus.yml		prometheus.yml

License

AbhinavPInamdar/EchoFS

Folders and files

Latest commit

History

Repository files navigation

EchoFS - Adaptive Consistency Distributed File System

Table of Contents

Overview

The Problem

Our Solution

Key Features

1. Adaptive Consistency Engine

2. User Authentication & Authorization

3. Distributed Architecture

4. Real-Time Monitoring

Architecture

System Components

Data Flow

Adaptive Consistency Controller

Why We Built It

Design Decisions

1. Policy-Based Decision Engine

2. Hysteresis Mechanism

3. Persistent State Management

4. Emergency Mode

5. Operator Overrides

Controller Architecture

API Endpoints

Design Decisions

1. Master-Worker Architecture

2. File Chunking (1MB chunks)

3. gRPC for Inter-Service Communication

4. PostgreSQL for Metadata

5. JWT Authentication

6. Bcrypt for Password Hashing

7. Role-Based Access Control

8. Compression Before Chunking

Technology Stack

Backend

Frontend

Infrastructure

Getting Started

Prerequisites

Local Development

1. Clone Repository

2. Setup Backend

3. Setup Frontend

4. Access Application

Testing

API Documentation

Authentication

Register User

Login

File Operations

Upload File

List Files

Download File

Delete File

Consistency Controller

Get Consistency Mode

Set Consistency Hint

Performance Results

Experimental Validation

Test Scenarios

Results

Real-World Performance

Security

Authentication & Authorization

Data Protection

Best Practices

Deployment

Production URLs

Deployment Architecture

Environment Variables

Future Enhancements

Short Term

Medium Term

Long Term

Research & Publications

Packages