Model Manager

Centralized queue system for AI model execution with VRAM management. Clients submit jobs via API; the system handles all VRAM coordination and Ollama execution automatically.

Installation

pip install flask platformdirs pyyaml requests

Quick Start

Ensure Ollama is running on localhost:11434
Start the service: systemctl --user start model-manager

Submit a job via the HTTP API:

curl -X POST http://localhost:5001/api/submit \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2.5:1.5b", "prompt": "Hello world"}'

CLI Tool

The ./cli tool provides commands for batch queries and vision analysis. It communicates with the running Model Manager service.

# Batch query: run a question against every line in a file
./cli batch-query items.txt "Is {item} a tool?"

# Full image analysis
./cli analyze photo.png

# Quick image Q&A
./cli quick photo.png "What color is the car?"

# Count objects in an image
./cli count photo.png "people"

# Introspection
./cli --version
./cli --print-defaults
./cli --print-resolved
./cli --print-config-schema
./cli --validate-config

Subcommands accept additional options (use ./cli <command> -h for details): batch-query supports --model, --priority, --timeout, --max-workers; analyze supports --model, --focus, --action, --checklist, --prompt, --timeout.

Subcommands

Command	Description
`batch-query FILE QUESTION`	Run multiple queries from a file with `{item}` placeholder
`analyze FILE`	Full image analysis with optional `--focus`, `--action`, `--checklist`, `--prompt`
`quick FILE QUESTION`	Quick Q&A about an image
`count FILE OBJECT_TYPE`	Count specific objects in an image

HTTP API

Endpoint	Method	Description
`/api/submit`	POST	Submit inference job
`/api/job/<job_id>`	GET	Get job status/result
`/api/models`	GET	List available models
`/api/models/refresh`	POST	Refresh model list from Ollama
`/api/stats`	GET	Queue and resource statistics
`/api/health`	GET	Health check

Submit Job

curl -X POST http://localhost:5001/api/submit \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:1.5b",
    "prompt": "Analyze this process",
    "priority": "high",
    "images": [],
    "metadata": {}
  }'

Poll Result

curl http://localhost:5001/api/job/<job_id>

Configuration

Copy config.example.yaml to config.local.yaml and edit:

cp config.example.yaml config.local.yaml

Settings include Ollama URL, VRAM margins, scheduler strategy, queue size, and HTTP port. Storage directories (data_dir, cache_dir) are auto-detected via platformdirs but can be overridden in config.

Architecture

Six core components with strict boundaries:

HTTP API (Flask) — external interface on port 5001
Internal API — client interface for job submission
Queue Manager — priority-based job storage and batching
VRAM Scheduler — load/unload decisions based on available GPU memory
Execution Engine — Ollama API integration for inference
Resource Monitor — VRAM state via nvidia-smi and Ollama /api/ps

Background scheduler loop runs every 100ms, pulling batches from the queue, creating execution plans, and dispatching to the engine.

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pkg/systemd		pkg/systemd
src/model_manager		src/model_manager
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
MONITORING.md		MONITORING.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
config.example.yaml		config.example.yaml
monitor.sh		monitor.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Manager

Installation

Quick Start

CLI Tool

Subcommands

HTTP API

Submit Job

Poll Result

Configuration

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Manager

Installation

Quick Start

CLI Tool

Subcommands

HTTP API

Submit Job

Poll Result

Configuration

Architecture

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages