Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions skills/openclaw-native/heartbeat-governor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
name: heartbeat-governor
version: "1.0"
category: openclaw-native
description: Enforces per-skill execution budgets for scheduled cron skills — pauses runaway skills that exceed their token or wall-clock budget before they drain your monthly API allowance.
stateful: true
cron: "0 * * * *"
---

# Heartbeat Governor

## What it does

Cron skills run autonomously. A skill with a bug — an infinite retry, an unexpectedly large context, a model call inside a loop — can silently consume hundreds of dollars before you notice.

Heartbeat Governor tracks cumulative execution cost and wall-clock time per scheduled skill on a rolling 30-day basis. When a skill exceeds its budget, the governor pauses it and sends an alert. The skill won't fire again until you explicitly review and resume it.

It runs every hour to catch budget overruns within one cron cycle.

## When to invoke

- Automatically, every hour (cron)
- Manually after noticing an unexpected API bill spike
- When a cron skill has been running unusually long

## Budget types

| Budget type | Default | Configurable |
|---|---|---|
| `max_usd_monthly` | $5.00 | Yes, per skill |
| `max_usd_per_run` | $0.50 | Yes, per skill |
| `max_wall_minutes` | 30 | Yes, per skill |
| `max_runs_daily` | 48 | Yes, per skill |

## Actions on budget breach

| Breach type | Action |
|---|---|
| `monthly_usd` exceeded | Pause skill, log breach, alert |
| `per_run_usd` exceeded | Abort current run, log breach |
| `wall_clock` exceeded | Abort current run, log breach |
| `daily_runs` exceeded | Skip remaining runs today, log |

## How to use

```bash
python3 governor.py --status # Show all skills and budget utilisation
python3 governor.py --record <skill> --usd 0.12 --minutes 4 # Record a run
python3 governor.py --pause <skill> # Manually pause a skill
python3 governor.py --resume <skill> # Resume a paused skill after review
python3 governor.py --set-budget <skill> --monthly 10.00 # Override budget
python3 governor.py --check # Run hourly check (called by cron)
python3 governor.py --report # Full monthly spend report
python3 governor.py --format json
```

## Cron wakeup behaviour

Every hour the governor runs `--check`:

1. Load all skill ledgers from state
2. For each skill with `paused: false`:
- If 30-day rolling spend exceeds `max_usd_monthly` → `paused: true`, log
- If runs today exceed `max_runs_daily` → skip, log
3. Print summary of paused skills and budget utilisation
4. Save updated state

## Procedure

**Step 1 — Set sensible budgets**

After installing any new cron skill, set its monthly budget:

```bash
python3 governor.py --set-budget daily-review --monthly 2.00
python3 governor.py --set-budget morning-briefing --monthly 3.00
```

Defaults are conservative ($5/month) but explicit is better.

**Step 2 — Monitor utilisation**

```bash
python3 governor.py --status
```

Review the utilisation column. Any skill above 80% monthly budget warrants investigation.

**Step 3 — Respond to pause alerts**

When the governor pauses a skill, investigate why it's over budget:
- Was there a one-time expensive run (large context)?
- Is there a bug causing repeated expensive calls?
- Does the budget simply need to be raised?

Resume after investigating:
```bash
python3 governor.py --resume <skill>
```

## State

Per-skill ledgers and pause flags stored in `~/.openclaw/skill-state/heartbeat-governor/state.yaml`.

Fields: `skill_ledgers` map, `paused_skills` list, `breach_log`, `monthly_summary`.
41 changes: 41 additions & 0 deletions skills/openclaw-native/heartbeat-governor/STATE_SCHEMA.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
version: "1.0"
description: Per-skill execution budgets, spend ledgers, pause flags, and breach log.
fields:
skill_ledgers:
type: object
description: Map of skill_name -> budget + rolling spend ledger
items:
budget:
type: object
properties:
max_usd_monthly: { type: float, default: 5.0 }
max_usd_per_run: { type: float, default: 0.5 }
max_wall_minutes: { type: integer, default: 30 }
max_runs_daily: { type: integer, default: 48 }
paused: { type: boolean, default: false }
pause_reason: { type: string }
paused_at: { type: datetime }
runs:
type: list
description: Rolling 30-day run log
items:
ran_at: { type: datetime }
usd_spent: { type: float }
wall_minutes: { type: float }
breach_log:
type: list
description: All budget breach events
items:
skill_name: { type: string }
breach_type: { type: enum, values: [monthly_usd, per_run_usd, wall_clock, daily_runs] }
value: { type: float }
limit: { type: float }
breached_at: { type: datetime }
resolved: { type: boolean }
monthly_summary:
type: object
description: Aggregated spend by skill for current calendar month
items:
skill_name: { type: string }
total_usd: { type: float }
total_runs: { type: integer }
64 changes: 64 additions & 0 deletions skills/openclaw-native/heartbeat-governor/example-state.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Example runtime state for heartbeat-governor
skill_ledgers:
morning-briefing:
budget:
max_usd_monthly: 4.00
max_usd_per_run: 0.30
max_wall_minutes: 15
max_runs_daily: 1
paused: false
pause_reason: null
paused_at: null
runs:
- ran_at: "2026-03-15T07:00:05.000000"
usd_spent: 0.18
wall_minutes: 6.2
- ran_at: "2026-03-14T07:00:03.000000"
usd_spent: 0.21
wall_minutes: 7.1
long-running-task-management:
budget:
max_usd_monthly: 5.00
max_usd_per_run: 0.50
max_wall_minutes: 30
max_runs_daily: 96
paused: true
pause_reason: "30-day spend $5.12 reached monthly limit $5.00"
paused_at: "2026-03-15T08:00:00.000000"
runs: []
cron-hygiene:
budget:
max_usd_monthly: 1.00
max_usd_per_run: 0.10
max_wall_minutes: 10
max_runs_daily: 2
paused: false
pause_reason: null
paused_at: null
runs:
- ran_at: "2026-03-10T09:00:07.000000"
usd_spent: 0.07
wall_minutes: 2.1
breach_log:
- skill_name: long-running-task-management
breach_type: monthly_usd
value: 5.12
limit: 5.00
breached_at: "2026-03-15T08:00:00.000000"
resolved: false
monthly_summary: {}
# ── Walkthrough ──────────────────────────────────────────────────────────────
# Hourly cron runs: python3 governor.py --check
#
# Heartbeat Governor — 2026-03-15 08:00
# ──────────────────────────────────────────────────────────────
# ⏸ Paused: long-running-task-management
#
# python3 governor.py --status
# Skill Spend Budget % Status
# cron-hygiene $0.07 $1.00 7% ✓
# long-running-task-management $5.12 $5.00 102% ⏸ PAUSED
# morning-briefing $0.39 $4.00 10% ✓
#
# python3 governor.py --resume long-running-task-management
# ✓ Resumed 'long-running-task-management'. Will fire on next scheduled run.
Loading
Loading