Torna Gateway

A production-grade LLM inference platform with API gateway, FastAPI backend, Redis queue, vLLM inference engine, GPU workers, vector database (RAG), and full monitoring stack.

Features

🔐 Authentication: JWT-based auth with API key support
💬 Chat Completions: OpenAI-compatible API with streaming support
🔤 Embeddings: Generate vector embeddings for RAG
📚 RAG: Retrieval-Augmented Generation with vector search
📊 Usage Tracking: Token accounting and cost tracking
🚦 Rate Limiting: Token bucket algorithm
🤖 Multiple Models: Support for various LLM providers
📈 Monitoring: Prometheus + Grafana integration
🐳 Docker: Full containerization support
☸️ Kubernetes: Production-ready deployment manifests

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.10+
Node.js 18+ (for frontend)

Development Setup

Clone the repository

git clone https://github.com/your-org/torna-gateway.git
cd torna-gateway

Set up environment variables

cp .env.example .env
# Edit .env with your settings

Start all services

docker-compose up -d

Seed initial data

python scripts/seed_data.py

Access the application

Architecture

┌─────────────────┐
│  Client / Web   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  API Gateway    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  FastAPI Backend│
│  • Auth (JWT)   │
│  • Rate Limit   │
│  • Token Track  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Redis Queue    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  vLLM Inference │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GPU Workers    │
└─────────────────┘

API Endpoints

Authentication

POST /api/v1/auth/register    # Register new user
POST /api/v1/auth/login       # Login and get JWT token
POST /api/v1/auth/refresh     # Refresh JWT token
POST /api/v1/auth/logout      # Invalidate token
GET  /api/v1/auth/me          # Get current user profile

API Keys

GET    /api/v1/api-keys               # List API keys
POST   /api/v1/api-keys               # Create API key
DELETE /api/v1/api-keys/{key_id}      # Revoke API key
GET    /api/v1/api-keys/{key_id}/usage # Get key usage

Chat Completions (OpenAI-compatible)

POST /v1/chat/completions    # Chat completion with streaming
GET  /v1/chat/completions/{id} # Get completion by ID

Embeddings

POST /v1/embeddings          # Generate embeddings

Models

GET  /v1/models              # List available models
GET  /v1/models/{model_id}   # Get model details

RAG

POST /api/v1/documents       # Upload document
GET  /api/v1/documents       # List documents
DELETE /api/v1/documents/{id} # Delete document
POST /api/v1/rag/query       # Query with RAG context

Usage & Billing

GET /api/v1/usage            # Get usage stats
GET /api/v1/billing/cost     # Get cost estimation
GET /api/v1/billing/providers # List provider pricing

Health & Metrics

GET /health                  # Health check
GET /metrics                 # Prometheus metrics
GET /api/v1/system/status    # System status

Example Usage

Register and Login

# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "secure123"}'

# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "secure123"}'

Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-api-key" \
  -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}]}'

Streaming Chat

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-api-key" \
  -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Testing

Run Unit Tests

pytest

Run with Coverage

pytest --cov=api --cov-report=html

Load Testing

locust -f tests/load/locustfile.py --host=http://localhost:8000

Kubernetes Deployment

Apply manifests

kubectl apply -f kubernetes/

Check deployment

kubectl get pods -n llm-platform
kubectl get hpa -n llm-platform

Configuration

Variable	Description	Default
`MOCK_DB`	Use mock database	`true`
`DEBUG`	Enable debug mode	`true`
`SECRET_KEY`	Application secret	-
`JWT_SECRET_KEY`	JWT signing key	-
`RATE_LIMIT_PER_MINUTE`	Rate limit per minute	`60`
`INFERENCE_ENGINE`	`mock` or `vllm`	`mock`

Development

Backend

pip install -r requirements.txt
uvicorn api.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Monitoring

Prometheus

Access at http://localhost:9090

Grafana

Access at http://localhost:3001

Username: admin
Password: admin_change_me

License

MIT License - see LICENSE file for details

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
api		api
docker		docker
docs		docs
frontend		frontend
kubernetes		kubernetes
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-for-prompt-2026-03-14T12-08-30.md		requirements-for-prompt-2026-03-14T12-08-30.md
requirements-for-prompt-2026-03-14T12-28-31.md		requirements-for-prompt-2026-03-14T12-28-31.md
requirements.txt		requirements.txt
specs-auto-2026-03-14T12-08-30.md		specs-auto-2026-03-14T12-08-30.md
specs-auto-2026-03-14T12-28-31.md		specs-auto-2026-03-14T12-28-31.md

Folders and files

Latest commit

History

Repository files navigation

Torna Gateway

Features

Quick Start

Prerequisites

Development Setup

Architecture

API Endpoints

Authentication

API Keys

Chat Completions (OpenAI-compatible)

Embeddings

Models

RAG

Usage & Billing

Health & Metrics

Example Usage

Register and Login

Chat Completion

Streaming Chat

Testing

Run Unit Tests

Run with Coverage

Load Testing

Kubernetes Deployment

Apply manifests

Check deployment

Configuration

Development

Backend

Frontend

Monitoring

Prometheus

Grafana

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages