A production-grade LLM inference platform with API gateway, FastAPI backend, Redis queue, vLLM inference engine, GPU workers, vector database (RAG), and full monitoring stack.
- 🔐 Authentication: JWT-based auth with API key support
- 💬 Chat Completions: OpenAI-compatible API with streaming support
- 🔤 Embeddings: Generate vector embeddings for RAG
- 📚 RAG: Retrieval-Augmented Generation with vector search
- 📊 Usage Tracking: Token accounting and cost tracking
- 🚦 Rate Limiting: Token bucket algorithm
- 🤖 Multiple Models: Support for various LLM providers
- 📈 Monitoring: Prometheus + Grafana integration
- 🐳 Docker: Full containerization support
- ☸️ Kubernetes: Production-ready deployment manifests
- Docker & Docker Compose
- Python 3.10+
- Node.js 18+ (for frontend)
- Clone the repository
git clone https://github.com/your-org/torna-gateway.git
cd torna-gateway- Set up environment variables
cp .env.example .env
# Edit .env with your settings- Start all services
docker-compose up -d- Seed initial data
python scripts/seed_data.py- Access the application
- API: http://localhost:8000
- Frontend: http://localhost:3000
- API Docs: http://localhost:8000/docs
- Health: http://localhost:8000/health
┌─────────────────┐
│ Client / Web │
└────────┬────────┘
│
▼
┌─────────────────┐
│ API Gateway │
└────────┬────────┘
│
▼
┌─────────────────┐
│ FastAPI Backend│
│ • Auth (JWT) │
│ • Rate Limit │
│ • Token Track │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Redis Queue │
└────────┬────────┘
│
▼
┌─────────────────┐
│ vLLM Inference │
└────────┬────────┘
│
▼
┌─────────────────┐
│ GPU Workers │
└─────────────────┘
POST /api/v1/auth/register # Register new user
POST /api/v1/auth/login # Login and get JWT token
POST /api/v1/auth/refresh # Refresh JWT token
POST /api/v1/auth/logout # Invalidate token
GET /api/v1/auth/me # Get current user profileGET /api/v1/api-keys # List API keys
POST /api/v1/api-keys # Create API key
DELETE /api/v1/api-keys/{key_id} # Revoke API key
GET /api/v1/api-keys/{key_id}/usage # Get key usagePOST /v1/chat/completions # Chat completion with streaming
GET /v1/chat/completions/{id} # Get completion by IDPOST /v1/embeddings # Generate embeddingsGET /v1/models # List available models
GET /v1/models/{model_id} # Get model detailsPOST /api/v1/documents # Upload document
GET /api/v1/documents # List documents
DELETE /api/v1/documents/{id} # Delete document
POST /api/v1/rag/query # Query with RAG contextGET /api/v1/usage # Get usage stats
GET /api/v1/billing/cost # Get cost estimation
GET /api/v1/billing/providers # List provider pricingGET /health # Health check
GET /metrics # Prometheus metrics
GET /api/v1/system/status # System status# Register
curl -X POST http://localhost:8000/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "secure123"}'
# Login
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "secure123"}'curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-api-key" \
-d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}]}'curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-api-key" \
-d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'pytestpytest --cov=api --cov-report=htmllocust -f tests/load/locustfile.py --host=http://localhost:8000kubectl apply -f kubernetes/kubectl get pods -n llm-platform
kubectl get hpa -n llm-platform| Variable | Description | Default |
|---|---|---|
MOCK_DB |
Use mock database | true |
DEBUG |
Enable debug mode | true |
SECRET_KEY |
Application secret | - |
JWT_SECRET_KEY |
JWT signing key | - |
RATE_LIMIT_PER_MINUTE |
Rate limit per minute | 60 |
INFERENCE_ENGINE |
mock or vllm |
mock |
pip install -r requirements.txt
uvicorn api.main:app --reloadcd frontend
npm install
npm run devAccess at http://localhost:9090
Access at http://localhost:3001
- Username: admin
- Password: admin_change_me
MIT License - see LICENSE file for details
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request