| Category | Badges |
|---|---|
| Languages | |
| Framework | |
| ML / GPU | |
| CI | |
| Code Style | |
| Tests | |
| Docs | |
| OS | |
| Version | |
| License | |
| Support | |
| GitHub |
NeuroServe is an AI Inference Server built on FastAPI, designed to run seamlessly on GPU (CUDA/ROCm), CPU, and macOS MPS. It provides ready-to-use REST APIs, a modular plugin system, runtime utilities, and a consistent unified response format β making it the perfect foundation for AI-powered services.
π§ Virtualenv quick guide: see docs/README_venv.md.
Detailed API reference and usage examples are available here: β‘οΈ API Documentation
- π REST APIs out-of-the-box with Swagger UI (
/docs) & ReDoc (/redoc). - β‘ PyTorch integration with automatic device selection (
cuda,cpu,mps,rocm). - π Plugin system to extend functionality with custom AI models or services.
- π Runtime tools for GPU info, warm-up routines, and environment inspection.
- π§ Built-in utilities like a toy model and model size calculator.
- π§± Unified JSON responses for predictable API behavior.
- π§ͺ Cross-platform CI/CD (Ubuntu, Windows, macOS, Self-hosted GPU).
repo-fastapi/
ββ app/ # application package
β ββ core/ # settings & configuration
β β ββ config.py # app settings (Pydantic v2)
β ββ routes/ # HTTP API routes
β ββ plugins/ # extensions / integrations
β ββ workflows/ # workflow definitions & orchestrators
β ββ templates/ # Jinja templates (if used)
ββ docs/ # documentation & generated diagrams
β ββ ARCHITECTURE.md # main architecture report
β ββ architecture.mmd # Mermaid source (no fences)
β ββ architecture.html # browser-friendly diagram
β ββ architecture.png # exported PNG (if mmdc installed)
β ββ runtime.mmd # runtime/infra diagram
β ββ imports.mmd # Python import graph (if generated)
β ββ endpoints.md # discovered API endpoints (if generated)
β ββ README_venv.md # virtualenv quick guide
ββ tools/ # project tooling & scripts
β ββ build_workflows_index.py # builds docs/workflows-overview.md
ββ tests/ # test suite
β ββ test_run.py # smoke test for app startup
ββ gen_arch.py # architecture generator script
ββ requirements.txt # runtime dependencies
ββ requirements-dev.txt # dev dependencies (ruff, pre-commit, pytest, ...)
ββ .pre-commit-config.yaml # pre-commit hooks configuration
ββ README.md # project overview & usage
ββ LICENSE # project license
For a deeper look into the internal design, modules, and flow of the system, see: β‘οΈ Architecture Guide
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-serverpython -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activatepip install -r requirements.txtpython -m scripts.install_torch --gpu # or --cpu / --rocmuvicorn app.main:app --reload --host 0.0.0.0 --port 8000Available endpoints:
- π Home β http://localhost:8000/
- β€οΈ Health β http://localhost:8000/health
- π Swagger UI β http://localhost:8000/docs
- π ReDoc β http://localhost:8000/redoc
- π§ Env Summary β http://localhost:8000/env
- π Plugins β http://localhost:8000/plugins
Quick test:
curl http://localhost:8000/health
# {"status": "ok"}Each plugin lives in app/plugins/<name>/ and typically includes:
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
API Endpoints:
GET /pluginsβ list all plugins with metadata.POST /plugins/{name}/{task}β execute a task inside a plugin.
Example:
from app.plugins.base import AIPlugin
class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]
def load(self):
# Load models/resources once
...
def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}A lightweight orchestration layer to chain plugins into reproducible pipelines (steps β plugin + task + payload).
All endpoints are exposed under /workflow.
- Endpoints:
GET /workflow/ping,GET /workflow/presets,POST /workflow/run - System Guide (EN): app/workflows/README.md
- Workflows Index: docs/workflows-overview.md
A full list of available workflows with their versions, tags, and step counts is maintained in the Workflows Index.
β‘οΈ View Workflows Index
A full list of available plugins with their providers, tasks, and source files is maintained in the Plugins Index.
β‘οΈ View Plugins Index
Install dev dependencies:
pip install -r requirements-dev.txt
pre-commit installRun tests:
pytestRuff (lint + format check) runs automatically via pre-commit hooks.
We enforce a clean and consistent code style using Ruff (linter, import sorter, and formatter). For full details on configuration, commands, helper scripts, and CI integration, see:
β‘οΈ Code Style & Linting Guide
Download models in advance:
python -m scripts.prefetch_modelsModels are cached in models_cache/ (see docs/LICENSES.md for licenses).
- Use
uvicorn/hypercornbehind a reverse proxy (e.g., Nginx). - Configure environment with
APP_*variables instead of hardcoding. - Enable HTTPS and configure CORS carefully in production.
A complete history of changes and improvements: β‘οΈ CHANGELOG
Details about the initial release v0.1.0: β‘οΈ Release Notes v0.1.0
- Add
/cudaendpoint β return detailed CUDA info. - Add
/warmupendpoint for GPU readiness. - Provide a plugin generator CLI.
- Implement API Key / JWT authentication.
- Example plugins: translation, summarization, image classification.
- Docker support for one-click deployment.
- Benchmark suite for model inference speed.
Contributions are welcome!
- Open Issues for bugs or ideas.
- Submit Pull Requests for improvements.
- Follow style guidelines (Ruff + pre-commit).
Licensed under the MIT License β see LICENSE.
Some AI/ML models are licensed separately β see Model Licenses.