Agent-aware code quality scoring for multi-agent codebases.
In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line -- human or AI -- and scores quality accordingly.
pip install arbiter # core (stdlib only)
pip install "arbiter[analyzers]" # + ruff, radon, vulture, bandit
# Or from source
git clone https://github.com/hummbl-dev/arbiter.git && cd arbiter
pip install -e ".[analyzers]"# Quick score -- no persistence, instant feedback
arbiter score /path/to/repo
# Full analysis with per-commit agent attribution
arbiter analyze /path/to/repo
# Agent leaderboard -- who writes the best code?
arbiter agents
# Start the dashboard (single HTML file, no build step)
arbiter serve --port 8080Arbiter wraps tools you already trust and combines them into a deterministic composite score:
| Analyzer | Tool | Weight | What It Finds |
|---|---|---|---|
| Lint | ruff | 35% | Style violations, import errors, bugbear patterns |
| Security | bandit | 30% | Hardcoded secrets, shell injection, dangerous patterns |
| Complexity | radon | 35% | Cyclomatic complexity (grade A-F per function) |
| Dead Code | vulture | penalty | Unused functions, imports, variables |
| Duplication | AST hash | penalty | Near-duplicate function bodies |
| Semgrep | semgrep | opt-in | Custom rule enforcement (enable via config) |
Scoring: 100 - (penalty / LOC) * normalization. Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60).
| Feature | Traditional Tools | Arbiter |
|---|---|---|
| Agent attribution | None | First-class: tracks Claude, Codex, Gemini, Copilot, humans |
| Per-commit scoring | Repo-wide only | Scores each commit's changed files individually |
| Diff analysis | N/A | Score only what changed in a PR/branch |
| Agent-specific gates | N/A | Different quality thresholds per agent trust tier |
| Dashboard | SaaS login | Single HTML file with per-agent timelines and fleet view |
| Dependencies | Heavy | Analysis tools only; core is stdlib Python |
arbiter analyze <repo> # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude] # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score changed files vs base branch
arbiter agents # Agent leaderboard
arbiter trend [--days 30] # Quality trend
arbiter worst [--limit 20] # Worst files
arbiter commits [--agent claude] # Recent commits with scores
arbiter audit-fleet <directory> # Audit all repos in a directory
arbiter triage # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run] # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080] # API + dashboardpip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v # 78 tests, <7 seconds- Python 3.11+
- git (for historian)
- Optional: ruff, radon, vulture, bandit (install via
[analyzers]extra)
Part of the HUMMBL cognitive AI architecture:
- hummbl-governance -- Governance primitives that Arbiter scores against
- base120 -- 120 mental models for structured reasoning
- mcp-server -- MCP server for AI agent integration
Learn more at hummbl.io.
MIT -- see LICENSE.
Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 14,000+ test codebase.