Skip to content

nanookclaw/agent-control-drift-evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

agent-control-drift-evaluator

Temporal behavioral drift evaluator for Agent Control. Detects gradual degradation patterns that point-in-time evaluators miss.

The Problem

Agent Control's built-in evaluators (regex, list, SQL, JSON) assess individual interactions. They answer: "Is this response safe right now?" But they don't answer: "Is this agent becoming less reliable over time?"

Empirical observation across 13 LLM agents showed:

  • Agents scoring 1.0 on point-in-time tests drifted ~7% on behavioral consistency over 28-day windows
  • Self-reported capability claims diverged from measured behavior by 7% on average
  • Degradation patterns were non-monotonic — stability windows followed by abrupt shifts, not gradual decline

This evaluator fills that gap.

How It Works

  1. Records behavioral observations per agent over time
  2. Compares recent window against an established baseline
  3. Flags drift when mean shift exceeds a configurable threshold
  4. Dampens false signals from tasks with multiple valid behavioral patterns (spec_clarity)

Drift Detection Method

  • Baseline vs window: First N observations establish baseline; last M observations are compared
  • Mean shift: Absolute delta between baseline mean and recent mean
  • Cohen's d: Standardized effect size for practical significance
  • Confidence: Weighted combination of sample size and effect size
  • Specification clarity: MULTI_VALID tasks suppress drift flags when effect size is small (agents legitimately behave differently on ambiguous tasks)

Installation

pip install agent-control-drift-evaluator

With Redis backend:

pip install agent-control-drift-evaluator[redis]

Usage

from agent_control import control

@control(
    name="behavioral-drift-check",
    evaluator="drift",
    config={
        "window_size": 7,         # recent observations to analyze
        "baseline_size": 10,      # observations for baseline
        "drift_threshold": 0.10,  # 10% mean shift triggers
        "dimensions": ["calibration", "adaptation", "robustness"],
        "action": "warn",         # or "deny" for critical agents
        "spec_clarity": "unambiguous",
    },
)
async def my_agent_step(input):
    ...

Observation Format

The evaluator expects data with agent_id and score:

{
    "agent_id": "my-agent-001",
    "score": 0.92,
    "dimension": "calibration",
    "timestamp": 1710844800.0,
    "metadata": {"probe": "pii-detection"}
}
Field Required Description
agent_id Identifies which agent this observation is for
score Behavioral measurement (0.0–1.0, higher = more reliable)
dimension Category for separate tracking (default: "default")
timestamp Unix epoch seconds (default: current time)
metadata Extra context (probe type, model version, etc.)

Configuration

Parameter Default Description
window_size 7 Recent observations to compare. Empirical minimum: 5
baseline_size 10 Initial observations for baseline
drift_threshold 0.10 Mean-shift threshold (0.0–1.0)
dimensions ["default"] Dimensions to track separately
action "warn" Action on drift: warn, deny, or log
storage_backend "file" Storage: file or redis
storage_dir ~/.agent-control-drift/observations File backend directory
spec_clarity "unambiguous" Task clarity: unambiguous, multi_valid, underspecified

Result Metadata

When drift is detected, the EvaluatorResult.metadata includes:

{
    "agent_id": "my-agent-001",
    "dimension": "calibration",
    "mean_shift": -0.15,
    "effect_size": 0.82,
    "drift_detected": true,
    "window_size": 7,
    "baseline_size": 10,
    "specification_clarity": "unambiguous",
    "total_observations": 28
}

Empirical Findings

From production validation across two independent systems:

  • Window ≥ 5 required: Below 5 observations, drift detection is noise. Validated on Gerundium (3-node swarm) and NexusGuard (19-agent fleet).
  • Non-monotonic drift: Agents don't degrade gradually. They show stability → abrupt shift → stability. Rolling windows catch this; cumulative averages blur it.
  • Specification clarity matters: Under identical prompts, one agent produced a stable 6A/4B split across two reasoning paths. Without spec_clarity: multi_valid, this would be flagged as drift.

Storage Backends

File (default)

Observations stored as JSON lines in ~/.agent-control-drift/observations/{agent_id}/{dimension}.jsonl. Atomic appends via O_APPEND. Good for single-host deployments.

Redis

config = DriftEvaluatorConfig(
    storage_backend="redis",
    redis_url="redis://localhost:6379/0",
)

Uses Redis lists with RPUSH/LRANGE. Better for multi-host or high-throughput setups.

Development

git clone https://github.com/nanookclaw/agent-control-drift-evaluator
cd agent-control-drift-evaluator
pip install -e ".[dev]"
pytest

License

MIT

References

About

Temporal behavioral drift evaluator for Agent Control — detects gradual degradation patterns over configurable observation windows

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages