hookdeck · leggetter · Feb 26, 2026 · Feb 26, 2026 · Feb 26, 2026 · Feb 26, 2026
diff --git a/TESTING.md b/TESTING.md
@@ -139,6 +139,7 @@ npx tsx tools/agent-scenario-tester/src/index.ts run receive-webhooks express
 
 - **receive-webhooks** — Setup Hookdeck, build handler with signature verification, run `hookdeck listen`, document inspect/retry workflow. Tests stages 01–04 (iterate is documentation-only: agent documents how to list request → event → attempt and retry; no live traffic required).
 - **receive-provider-webhooks** — Same plus a provider (e.g. Stripe). Use `--provider stripe`. Only the event-gateway skill is pre-installed; the agent is expected to discover and use the provider skill from webhook-skills (e.g. stripe-webhooks) and use the provider SDK in the handler. Tests composition and the provider-webhooks checklist.
+- **investigate-delivery-health** — Documentation-only: assume the user has had webhooks for a week and wants to understand delivery performance (success vs failure, backlog, latency). The prompt does **not** mention "metrics" or "hookdeck gateway metrics"; the assessor checks whether the agent used metrics CLI commands. Use to verify that agents discover and use metrics from the skill when the task implies it.
 
 ### Scenario run checklist
 
@@ -150,6 +151,7 @@ Run these and evaluate results; iterate on skills or prompts as needed.
 | 2 | receive-webhooks | Next.js | `./scripts/test-agent-scenario.sh run receive-webhooks nextjs` | Done |
 | 3 | receive-webhooks | FastAPI | `./scripts/test-agent-scenario.sh run receive-webhooks fastapi` | Done |
 | 4 | receive-provider-webhooks | Express | `./scripts/test-agent-scenario.sh run receive-provider-webhooks express --provider stripe` | Done |
+| 5 | investigate-delivery-health | Express | `./scripts/test-agent-scenario.sh run investigate-delivery-health express` | — |
 
 **Output:** `test-results/<scenario>-<framework>-<provider?>-<timestamp>/` containing `report.md` (checklist + automated score), `run.log` (full Claude output), and generated project files. To re-run only the assessor (e.g. after fixing the tool): `./scripts/test-agent-scenario.sh assess <resultDir>`.
 

diff --git a/scenarios.yaml b/scenarios.yaml
@@ -71,6 +71,30 @@ scenarios:
           - Idiomatic to framework
           - No syntax errors or obvious bugs
 
+  - name: investigate-delivery-health
+    displayName: Investigate delivery health (metrics usage)
+    description: >
+      Documentation-only scenario: assume traffic exists; ask the agent how to
+      get a "performance picture" from the CLI. Verifies the agent discovers and
+      uses hookdeck gateway metrics without the prompt mentioning metrics.
+    stages:
+      - iterate
+    prompt: >
+      Assume the user has been receiving webhooks via Hookdeck for the past week.
+      They want to understand how delivery has been performing: e.g. how many
+      events succeeded vs failed, whether there's a backlog, and if latency has
+      been acceptable. In the README or in your reply, list the exact CLI
+      commands (or steps) you would use to get that picture from the terminal,
+      without opening the Dashboard. If you use the installed event-gateway
+      skill, say which file you referenced.
+    evaluation:
+      - stage: Stage - Investigate delivery
+        points: 3
+        checks:
+          - References monitoring-debugging or metrics material
+          - Uses metrics CLI (hookdeck gateway metrics or equivalent)
+          - At least one concrete metrics command with time range and measures
+
   - name: receive-provider-webhooks
     displayName: Receive Provider Webhooks (with composition)
     description: >

diff --git a/skills/event-gateway/references/03-listen.md b/skills/event-gateway/references/03-listen.md
@@ -60,15 +60,15 @@ hookdeck gateway connection create \
   --source-type WEBHOOK \
   --destination-name "cli-slack-local" \
   --destination-type CLI \
-  --destination-path /slack
+  --destination-cli-path /slack
 
 hookdeck gateway connection create \
   --name "github-local" \
   --source-name "github" \
   --source-type WEBHOOK \
   --destination-name "cli-github-local" \
   --destination-type CLI \
-  --destination-path /github
+  --destination-cli-path /github
 ```
 
 ### Listen in One Session

diff --git a/skills/event-gateway/references/cli-workflows.md b/skills/event-gateway/references/cli-workflows.md
@@ -223,7 +223,14 @@ hookdeck gateway attempt list --event-id evt_xxx
 hookdeck gateway attempt get att_xxx
 ```
 
-For full flag and option details, fetch [/docs/cli.md](https://hookdeck.com/docs/cli.md) or the per-command pages: [/docs/cli/source.md](https://hookdeck.com/docs/cli/source.md), [/docs/cli/destination.md](https://hookdeck.com/docs/cli/destination.md), [/docs/cli/transformation.md](https://hookdeck.com/docs/cli/transformation.md), [/docs/cli/request.md](https://hookdeck.com/docs/cli/request.md), [/docs/cli/event.md](https://hookdeck.com/docs/cli/event.md), [/docs/cli/attempt.md](https://hookdeck.com/docs/cli/attempt.md).
+**Metrics** (event/request/attempt/queue/pending/transformations over time): use `hookdeck gateway metrics` with subcommands `events`, `requests`, `attempts`, `queue-depth`, `pending`, `events-by-issue`, `transformations`. Required: `--start`, `--end`, `--measures`. See [monitoring-debugging.md](monitoring-debugging.md#cli-metrics) or [Metrics docs](https://hookdeck.com/docs/metrics) for examples. For the full CLI metrics reference, fetch [/docs/cli/metrics.md](https://hookdeck.com/docs/cli/metrics.md).
+
+```sh
+hookdeck gateway metrics events --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,failed_count,error_rate
+hookdeck gateway metrics queue-depth --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures max_depth,max_age
+```
+
+For full flag and option details, fetch [/docs/cli.md](https://hookdeck.com/docs/cli.md) or the per-command pages: [/docs/cli/source.md](https://hookdeck.com/docs/cli/source.md), [/docs/cli/destination.md](https://hookdeck.com/docs/cli/destination.md), [/docs/cli/transformation.md](https://hookdeck.com/docs/cli/transformation.md), [/docs/cli/request.md](https://hookdeck.com/docs/cli/request.md), [/docs/cli/event.md](https://hookdeck.com/docs/cli/event.md), [/docs/cli/attempt.md](https://hookdeck.com/docs/cli/attempt.md), [/docs/cli/metrics.md](https://hookdeck.com/docs/cli/metrics.md).
 
 ## Project Management
 
@@ -242,4 +249,4 @@ For the full project reference, fetch [/docs/cli/project.md](https://hookdeck.co
 - [Listen command](https://hookdeck.com/docs/cli/listen)
 - [Connection commands](https://hookdeck.com/docs/cli/connection)
 - [Project commands](https://hookdeck.com/docs/cli/project)
-- [Source](https://hookdeck.com/docs/cli/source) · [Destination](https://hookdeck.com/docs/cli/destination) · [Transformation](https://hookdeck.com/docs/cli/transformation) · [Request](https://hookdeck.com/docs/cli/request) · [Event](https://hookdeck.com/docs/cli/event) · [Attempt](https://hookdeck.com/docs/cli/attempt)
+- [Source](https://hookdeck.com/docs/cli/source) · [Destination](https://hookdeck.com/docs/cli/destination) · [Transformation](https://hookdeck.com/docs/cli/transformation) · [Request](https://hookdeck.com/docs/cli/request) · [Event](https://hookdeck.com/docs/cli/event) · [Attempt](https://hookdeck.com/docs/cli/attempt) · [Metrics](https://hookdeck.com/docs/cli/metrics)
diff --git a/skills/event-gateway/references/monitoring-debugging.md b/skills/event-gateway/references/monitoring-debugging.md
@@ -6,6 +6,7 @@
 - [Data Model](#data-model)
 - [Event Statuses](#event-statuses)
 - [Debugging Surfaces](#debugging-surfaces)
+- [CLI metrics](#cli-metrics)
 - [Troubleshooting Flowchart](#troubleshooting-flowchart)
 - [Issues and Notifications](#issues-and-notifications)
 - [Replay](#replay)
@@ -77,7 +78,63 @@ hookdeck gateway attempt get att_xxx
 
 See [Request commands](https://hookdeck.com/docs/cli/request.md), [Event commands](https://hookdeck.com/docs/cli/event.md), and [Attempt commands](https://hookdeck.com/docs/cli/attempt.md) for full options.
 
-**Metrics:** CLI metrics commands (e.g. request/event/attempt counts over time) may be added in a future release. Until then, use the [Dashboard](https://dashboard.hookdeck.com) or [Metrics API](https://hookdeck.com/docs/metrics).
+### CLI metrics {#cli-metrics}
+
+Metrics over time are available in the [Dashboard](https://dashboard.hookdeck.com) ([Metrics page](https://dashboard.hookdeck.com/metrics) and Source/Connection/Destination pages) and via `hookdeck gateway metrics` and its subcommands. All CLI commands require a date range (`--start`, `--end`, ISO 8601) and at least one `--measures` value; optional filters include `--granularity`, `--dimensions`, `--source-id`, `--destination-id`, `--connection-id`, and `--status`. See [Metrics](https://hookdeck.com/docs/metrics) and the [CLI metrics reference](https://hookdeck.com/docs/cli/metrics) for full reference.
+
+| Subcommand | Purpose |
+|------------|---------|
+| `metrics events` | Event volume, success/failure counts, error rate over time |
+| `metrics requests` | Request acceptance vs rejection counts |
+| `metrics attempts` | Delivery latency and success/failure |
+| `metrics queue-depth` | Queue backlog per destination (e.g. max_depth, max_age) |
+| `metrics pending` | Pending events timeseries |
+| `metrics events-by-issue` | Events grouped by issue (debugging); requires issue ID as argument |
+| `metrics transformations` | Transformation run counts and error rate |
+
+**Example commands (use cases):**
+
+Event volume and failure rate over time:
+
+```sh
+hookdeck gateway metrics events --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --granularity 1d --measures count,failed_count,error_rate
+```
+
+Request acceptance vs rejection:
+
+```sh
+hookdeck gateway metrics requests --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,accepted_count,rejected_count
+```
+
+Delivery latency (attempts):
+
+```sh
+hookdeck gateway metrics attempts --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures response_latency_avg,response_latency_p95
+```
+
+Queue backlog per destination:
+
+```sh
+hookdeck gateway metrics queue-depth --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures max_depth,max_age --destination-id dest_xxx
+```
+
+Pending events over time:
+
+```sh
+hookdeck gateway metrics pending --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --granularity 1h --measures count
+```
+
+Events grouped by issue (debugging):
+
+```sh
+hookdeck gateway metrics events-by-issue iss_xxx --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count
+```
+
+Transformation errors:
+
+```sh
+hookdeck gateway metrics transformations --start 2026-02-01T00:00:00Z --end 2026-02-25T00:00:00Z --measures count,failed_count,error_rate
+```
 
 ### REST API
 

diff --git a/tools/agent-scenario-tester/src/assess.ts b/tools/agent-scenario-tester/src/assess.ts
@@ -155,6 +155,16 @@ function passesCheck(
     }
     return false;
   }
+  if (stage === 'Stage - Investigate delivery') {
+    if (index === 0) return /monitoring-debugging|metrics/i.test(doc);
+    if (index === 1) return /hookdeck gateway metrics|gateway metrics/i.test(doc);
+    if (index === 2) {
+      const hasTimeRange = /--start|--end|\bstart\b.*\bend\b/i.test(doc);
+      const hasMeasures = /--measures|metrics\s+(events|requests|attempts|queue-depth|pending|events-by-issue|transformations)/i.test(doc);
+      return hasTimeRange && hasMeasures;
+    }
+    return false;
+  }
   return false;
 }