Skip to content

JMonde/torna-gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Torna Gateway

A production-grade LLM inference platform with API gateway, FastAPI backend, Redis queue, vLLM inference engine, GPU workers, vector database (RAG), and full monitoring stack.

Features

  • 🔐 Authentication: JWT-based auth with API key support
  • 💬 Chat Completions: OpenAI-compatible API with streaming support
  • 🔤 Embeddings: Generate vector embeddings for RAG
  • 📚 RAG: Retrieval-Augmented Generation with vector search
  • 📊 Usage Tracking: Token accounting and cost tracking
  • 🚦 Rate Limiting: Token bucket algorithm
  • 🤖 Multiple Models: Support for various LLM providers
  • 📈 Monitoring: Prometheus + Grafana integration
  • 🐳 Docker: Full containerization support
  • ☸️ Kubernetes: Production-ready deployment manifests

Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.10+
  • Node.js 18+ (for frontend)

Development Setup

  1. Clone the repository
git clone https://github.com/your-org/torna-gateway.git
cd torna-gateway
  1. Set up environment variables
cp .env.example .env
# Edit .env with your settings
  1. Start all services
docker-compose up -d
  1. Seed initial data
python scripts/seed_data.py
  1. Access the application

Architecture

┌─────────────────┐
│  Client / Web   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  API Gateway    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  FastAPI Backend│
│  • Auth (JWT)   │
│  • Rate Limit   │
│  • Token Track  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Redis Queue    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  vLLM Inference │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GPU Workers    │
└─────────────────┘

API Endpoints

Authentication

POST /api/v1/auth/register    # Register new user
POST /api/v1/auth/login       # Login and get JWT token
POST /api/v1/auth/refresh     # Refresh JWT token
POST /api/v1/auth/logout      # Invalidate token
GET  /api/v1/auth/me          # Get current user profile

API Keys

GET    /api/v1/api-keys               # List API keys
POST   /api/v1/api-keys               # Create API key
DELETE /api/v1/api-keys/{key_id}      # Revoke API key
GET    /api/v1/api-keys/{key_id}/usage # Get key usage

Chat Completions (OpenAI-compatible)

POST /v1/chat/completions    # Chat completion with streaming
GET  /v1/chat/completions/{id} # Get completion by ID

Embeddings

POST /v1/embeddings          # Generate embeddings

Models

GET  /v1/models              # List available models
GET  /v1/models/{model_id}   # Get model details

RAG

POST /api/v1/documents       # Upload document
GET  /api/v1/documents       # List documents
DELETE /api/v1/documents/{id} # Delete document
POST /api/v1/rag/query       # Query with RAG context

Usage & Billing

GET /api/v1/usage            # Get usage stats
GET /api/v1/billing/cost     # Get cost estimation
GET /api/v1/billing/providers # List provider pricing

Health & Metrics

GET /health                  # Health check
GET /metrics                 # Prometheus metrics
GET /api/v1/system/status    # System status

Example Usage

Register and Login

# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "secure123"}'

# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "secure123"}'

Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-api-key" \
  -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}]}'

Streaming Chat

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-api-key" \
  -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Testing

Run Unit Tests

pytest

Run with Coverage

pytest --cov=api --cov-report=html

Load Testing

locust -f tests/load/locustfile.py --host=http://localhost:8000

Kubernetes Deployment

Apply manifests

kubectl apply -f kubernetes/

Check deployment

kubectl get pods -n llm-platform
kubectl get hpa -n llm-platform

Configuration

Variable Description Default
MOCK_DB Use mock database true
DEBUG Enable debug mode true
SECRET_KEY Application secret -
JWT_SECRET_KEY JWT signing key -
RATE_LIMIT_PER_MINUTE Rate limit per minute 60
INFERENCE_ENGINE mock or vllm mock

Development

Backend

pip install -r requirements.txt
uvicorn api.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Monitoring

Prometheus

Access at http://localhost:9090

Grafana

Access at http://localhost:3001

  • Username: admin
  • Password: admin_change_me

License

MIT License - see LICENSE file for details

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors