Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ npx tsx tools/agent-scenario-tester/src/index.ts run receive-webhooks express

- **receive-webhooks** — Setup Hookdeck, build handler with signature verification, run `hookdeck listen`, document inspect/retry workflow. Tests stages 01–04 (iterate is documentation-only: agent documents how to list request → event → attempt and retry; no live traffic required).
- **receive-provider-webhooks** — Same plus a provider (e.g. Stripe). Use `--provider stripe`. Only the event-gateway skill is pre-installed; the agent is expected to discover and use the provider skill from webhook-skills (e.g. stripe-webhooks) and use the provider SDK in the handler. Tests composition and the provider-webhooks checklist.
- **investigate-delivery-health** — Documentation-only: assume the user has had webhooks for a week and wants to understand delivery performance (success vs failure, backlog, latency). The prompt does **not** mention "metrics" or "hookdeck gateway metrics"; the assessor checks whether the agent used metrics CLI commands. Use to verify that agents discover and use metrics from the skill when the task implies it.

### Scenario run checklist

Expand All @@ -150,6 +151,7 @@ Run these and evaluate results; iterate on skills or prompts as needed.
| 2 | receive-webhooks | Next.js | `./scripts/test-agent-scenario.sh run receive-webhooks nextjs` | Done |
| 3 | receive-webhooks | FastAPI | `./scripts/test-agent-scenario.sh run receive-webhooks fastapi` | Done |
| 4 | receive-provider-webhooks | Express | `./scripts/test-agent-scenario.sh run receive-provider-webhooks express --provider stripe` | Done |
| 5 | investigate-delivery-health | Express | `./scripts/test-agent-scenario.sh run investigate-delivery-health express` | — |

**Output:** `test-results/<scenario>-<framework>-<provider?>-<timestamp>/` containing `report.md` (checklist + automated score), `run.log` (full Claude output), and generated project files. To re-run only the assessor (e.g. after fixing the tool): `./scripts/test-agent-scenario.sh assess <resultDir>`.

Expand Down
24 changes: 24 additions & 0 deletions scenarios.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,30 @@ scenarios:
- Idiomatic to framework
- No syntax errors or obvious bugs

- name: investigate-delivery-health
displayName: Investigate delivery health (metrics usage)
description: >
Documentation-only scenario: assume traffic exists; ask the agent how to
get a "performance picture" from the CLI. Verifies the agent discovers and
uses hookdeck gateway metrics without the prompt mentioning metrics.
stages:
- iterate
prompt: >
Assume the user has been receiving webhooks via Hookdeck for the past week.
They want to understand how delivery has been performing: e.g. how many
events succeeded vs failed, whether there's a backlog, and if latency has
been acceptable. In the README or in your reply, list the exact CLI
commands (or steps) you would use to get that picture from the terminal,
without opening the Dashboard. If you use the installed event-gateway
skill, say which file you referenced.
evaluation:
- stage: Stage - Investigate delivery
points: 3
checks:
- References monitoring-debugging or metrics material
- Uses metrics CLI (hookdeck gateway metrics or equivalent)
- At least one concrete metrics command with time range and measures

- name: receive-provider-webhooks
displayName: Receive Provider Webhooks (with composition)
description: >
Expand Down
4 changes: 2 additions & 2 deletions skills/event-gateway/references/03-listen.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,15 @@ hookdeck gateway connection create \
--source-type WEBHOOK \
--destination-name "cli-slack-local" \
--destination-type CLI \
--destination-path /slack
--destination-cli-path /slack

hookdeck gateway connection create \
--name "github-local" \
--source-name "github" \
--source-type WEBHOOK \
--destination-name "cli-github-local" \
--destination-type CLI \
--destination-path /github
--destination-cli-path /github
```

### Listen in One Session
Expand Down
11 changes: 9 additions & 2 deletions skills/event-gateway/references/cli-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,14 @@ hookdeck gateway attempt list --event-id evt_xxx
hookdeck gateway attempt get att_xxx
```

For full flag and option details, fetch [/docs/cli.md](https://hookdeck.com/docs/cli.md) or the per-command pages: [/docs/cli/source.md](https://hookdeck.com/docs/cli/source.md), [/docs/cli/destination.md](https://hookdeck.com/docs/cli/destination.md), [/docs/cli/transformation.md](https://hookdeck.com/docs/cli/transformation.md), [/docs/cli/request.md](https://hookdeck.com/docs/cli/request.md), [/docs/cli/event.md](https://hookdeck.com/docs/cli/event.md), [/docs/cli/attempt.md](https://hookdeck.com/docs/cli/attempt.md).
**Metrics** (event/request/attempt/queue/pending/transformations over time): use `hookdeck gateway metrics` with subcommands `events`, `requests`, `attempts`, `queue-depth`, `pending`, `events-by-issue`, `transformations`. Required: `--start`, `--end`, `--measures`. See [monitoring-debugging.md](monitoring-debugging.md#cli-metrics) or [Metrics docs](https://hookdeck.com/docs/metrics) for examples. For the full CLI metrics reference, fetch [/docs/cli/metrics.md](https://hookdeck.com/docs/cli/metrics.md).

```sh
hookdeck gateway metrics events --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,failed_count,error_rate
hookdeck gateway metrics queue-depth --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures max_depth,max_age
```

For full flag and option details, fetch [/docs/cli.md](https://hookdeck.com/docs/cli.md) or the per-command pages: [/docs/cli/source.md](https://hookdeck.com/docs/cli/source.md), [/docs/cli/destination.md](https://hookdeck.com/docs/cli/destination.md), [/docs/cli/transformation.md](https://hookdeck.com/docs/cli/transformation.md), [/docs/cli/request.md](https://hookdeck.com/docs/cli/request.md), [/docs/cli/event.md](https://hookdeck.com/docs/cli/event.md), [/docs/cli/attempt.md](https://hookdeck.com/docs/cli/attempt.md), [/docs/cli/metrics.md](https://hookdeck.com/docs/cli/metrics.md).

## Project Management

Expand All @@ -242,4 +249,4 @@ For the full project reference, fetch [/docs/cli/project.md](https://hookdeck.co
- [Listen command](https://hookdeck.com/docs/cli/listen)
- [Connection commands](https://hookdeck.com/docs/cli/connection)
- [Project commands](https://hookdeck.com/docs/cli/project)
- [Source](https://hookdeck.com/docs/cli/source) · [Destination](https://hookdeck.com/docs/cli/destination) · [Transformation](https://hookdeck.com/docs/cli/transformation) · [Request](https://hookdeck.com/docs/cli/request) · [Event](https://hookdeck.com/docs/cli/event) · [Attempt](https://hookdeck.com/docs/cli/attempt)
- [Source](https://hookdeck.com/docs/cli/source) · [Destination](https://hookdeck.com/docs/cli/destination) · [Transformation](https://hookdeck.com/docs/cli/transformation) · [Request](https://hookdeck.com/docs/cli/request) · [Event](https://hookdeck.com/docs/cli/event) · [Attempt](https://hookdeck.com/docs/cli/attempt) · [Metrics](https://hookdeck.com/docs/cli/metrics)
59 changes: 58 additions & 1 deletion skills/event-gateway/references/monitoring-debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- [Data Model](#data-model)
- [Event Statuses](#event-statuses)
- [Debugging Surfaces](#debugging-surfaces)
- [CLI metrics](#cli-metrics)
- [Troubleshooting Flowchart](#troubleshooting-flowchart)
- [Issues and Notifications](#issues-and-notifications)
- [Replay](#replay)
Expand Down Expand Up @@ -77,7 +78,63 @@ hookdeck gateway attempt get att_xxx

See [Request commands](https://hookdeck.com/docs/cli/request.md), [Event commands](https://hookdeck.com/docs/cli/event.md), and [Attempt commands](https://hookdeck.com/docs/cli/attempt.md) for full options.

**Metrics:** CLI metrics commands (e.g. request/event/attempt counts over time) may be added in a future release. Until then, use the [Dashboard](https://dashboard.hookdeck.com) or [Metrics API](https://hookdeck.com/docs/metrics).
### CLI metrics {#cli-metrics}

Metrics over time are available in the [Dashboard](https://dashboard.hookdeck.com) ([Metrics page](https://dashboard.hookdeck.com/metrics) and Source/Connection/Destination pages) and via `hookdeck gateway metrics` and its subcommands. All CLI commands require a date range (`--start`, `--end`, ISO 8601) and at least one `--measures` value; optional filters include `--granularity`, `--dimensions`, `--source-id`, `--destination-id`, `--connection-id`, and `--status`. See [Metrics](https://hookdeck.com/docs/metrics) and the [CLI metrics reference](https://hookdeck.com/docs/cli/metrics) for full reference.

| Subcommand | Purpose |
|------------|---------|
| `metrics events` | Event volume, success/failure counts, error rate over time |
| `metrics requests` | Request acceptance vs rejection counts |
| `metrics attempts` | Delivery latency and success/failure |
| `metrics queue-depth` | Queue backlog per destination (e.g. max_depth, max_age) |
| `metrics pending` | Pending events timeseries |
| `metrics events-by-issue` | Events grouped by issue (debugging); requires issue ID as argument |
| `metrics transformations` | Transformation run counts and error rate |

**Example commands (use cases):**

Event volume and failure rate over time:

```sh
hookdeck gateway metrics events --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --granularity 1d --measures count,failed_count,error_rate
```

Request acceptance vs rejection:

```sh
hookdeck gateway metrics requests --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,accepted_count,rejected_count
```

Delivery latency (attempts):

```sh
hookdeck gateway metrics attempts --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures response_latency_avg,response_latency_p95
```

Queue backlog per destination:

```sh
hookdeck gateway metrics queue-depth --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures max_depth,max_age --destination-id dest_xxx
```

Pending events over time:

```sh
hookdeck gateway metrics pending --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --granularity 1h --measures count
```

Events grouped by issue (debugging):

```sh
hookdeck gateway metrics events-by-issue iss_xxx --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count
```

Transformation errors:

```sh
hookdeck gateway metrics transformations --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,failed_count,error_rate
```

### REST API

Expand Down
10 changes: 10 additions & 0 deletions tools/agent-scenario-tester/src/assess.ts
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,16 @@ function passesCheck(
}
return false;
}
if (stage === 'Stage - Investigate delivery') {
if (index === 0) return /monitoring-debugging|metrics/i.test(doc);
if (index === 1) return /hookdeck gateway metrics|gateway metrics/i.test(doc);
if (index === 2) {
const hasTimeRange = /--start|--end|\bstart\b.*\bend\b/i.test(doc);
const hasMeasures = /--measures|metrics\s+(events|requests|attempts|queue-depth|pending|events-by-issue|transformations)/i.test(doc);
return hasTimeRange && hasMeasures;
}
return false;
}
return false;
}

Expand Down