Skip to content

hummbl-dev/arbiter

Repository files navigation

Arbiter

CI PyPI Python 3.11+ License: MIT Dependencies

Agent-aware code quality scoring for multi-agent codebases.

In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line -- human or AI -- and scores quality accordingly.

Install

pip install arbiter                  # core (stdlib only)
pip install "arbiter[analyzers]"     # + ruff, radon, vulture, bandit

# Or from source
git clone https://github.com/hummbl-dev/arbiter.git && cd arbiter
pip install -e ".[analyzers]"

Usage

# Quick score -- no persistence, instant feedback
arbiter score /path/to/repo

# Full analysis with per-commit agent attribution
arbiter analyze /path/to/repo

# Agent leaderboard -- who writes the best code?
arbiter agents

# Start the dashboard (single HTML file, no build step)
arbiter serve --port 8080

What It Scores

Arbiter wraps tools you already trust and combines them into a deterministic composite score:

Analyzer Tool Weight What It Finds
Lint ruff 35% Style violations, import errors, bugbear patterns
Security bandit 30% Hardcoded secrets, shell injection, dangerous patterns
Complexity radon 35% Cyclomatic complexity (grade A-F per function)
Dead Code vulture penalty Unused functions, imports, variables
Duplication AST hash penalty Near-duplicate function bodies
Semgrep semgrep opt-in Custom rule enforcement (enable via config)

Scoring: 100 - (penalty / LOC) * normalization. Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60).

What Makes Arbiter Different

Feature Traditional Tools Arbiter
Agent attribution None First-class: tracks Claude, Codex, Gemini, Copilot, humans
Per-commit scoring Repo-wide only Scores each commit's changed files individually
Diff analysis N/A Score only what changed in a PR/branch
Agent-specific gates N/A Different quality thresholds per agent trust tier
Dashboard SaaS login Single HTML file with per-agent timelines and fleet view
Dependencies Heavy Analysis tools only; core is stdlib Python

CLI Reference

arbiter analyze <repo>                     # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude]  # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score changed files vs base branch
arbiter agents                             # Agent leaderboard
arbiter trend [--days 30]                  # Quality trend
arbiter worst [--limit 20]                 # Worst files
arbiter commits [--agent claude]           # Recent commits with scores
arbiter audit-fleet <directory>            # Audit all repos in a directory
arbiter triage                             # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run]             # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080]                # API + dashboard

Tests

pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v    # 78 tests, <7 seconds

Requirements

  • Python 3.11+
  • git (for historian)
  • Optional: ruff, radon, vulture, bandit (install via [analyzers] extra)

HUMMBL Ecosystem

Part of the HUMMBL cognitive AI architecture:

  • hummbl-governance -- Governance primitives that Arbiter scores against
  • base120 -- 120 mental models for structured reasoning
  • mcp-server -- MCP server for AI agent integration

Learn more at hummbl.io.

License

MIT -- see LICENSE.


Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 14,000+ test codebase.

About

Agent-aware code quality scoring — cyclomatic complexity, dependency analysis, test coverage, governance compliance. Grades repos A through F with evidence-backed findings.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors