Skip to content

feat(init): add init command for guided Sentry project setup#283

Open
betegon wants to merge 68 commits intomainfrom
feat/init-command
Open

feat(init): add init command for guided Sentry project setup#283
betegon wants to merge 68 commits intomainfrom
feat/init-command

Conversation

@betegon
Copy link
Member

@betegon betegon commented Feb 23, 2026

Summary

Adds sentry init — an AI-powered wizard that walks users through adding Sentry to their project. It detects the platform, installs the SDK, instruments the code, and configures error monitoring, tracing, and session replay.

Changes

Core wizard

  • New init command backed by a Mastra AI workflow (hosted at getsentry/cli-init-api) that handles platform detection, SDK installation, and code instrumentation
  • ASCII banner, AI transparency note, and review reminder in the wizard UX
  • Tracing: unique trace IDs per wizard run with flattened span hierarchy
  • Python platforms use venv for isolated dependency installation
  • Magic values extracted into named constants (constants.ts)
  • Docs page added to cli.sentry.dev

Security hardening

  • Shell metacharacter blocklist: () subshell bypass, > < & redirection/background, $ ' " \ expansion/escaping, { } * ? glob/brace expansion, # shell comment
  • Environment variable injection blocking (VAR=value cmd pattern)
  • Remote-supplied cwd validation against project directory
  • Dangerous executable blocklist and path-traversal prevention
  • File descriptor cleanup on readSync failure
  • Cross-platform shell execution ({ shell: true } instead of hardcoded sh)

Performance

Testing & CI

  • Wizard-runner unit tests (coverage 5.94% → 99.42%)
  • Banner extracted to src/lib/banner.ts to break circular import
  • Mock isolation fixes (spyOn instead of mock.module where possible)
  • Simplified coverage pipeline (removed merge-lcov workaround)

Eval suite

The eval suite validates that the wizard produces correct, buildable Sentry instrumentation for each supported platform. It uses a 3-phase test architecture:

Phase 1: Wizard run

Each test scaffolds a fresh project from a platform template, then runs the full sentry init wizard against it. The wizard output (exit code, stdout/stderr, git diff, new files) is captured for the next phases.

Phase 2: Hard assertions (deterministic)

Five code-based pass/fail checks that run without any LLM:

  1. exit-code — wizard exits 0
  2. sdk-installed — the Sentry SDK package appears in the dependency file (package.json / requirements.txt)
  3. init-presentSentry.init (or sentry_sdk.init) appears in changed or new files
  4. no-placeholder-dsn — no leftover placeholder DSNs (___PUBLIC_DSN___, YOUR_DSN_HERE, etc.)
  5. build-succeedsnpm run build / equivalent passes after the wizard's changes

Phase 3: LLM judge (per-feature)

For each feature (errors, tracing, replay, logs, profiling, etc.), an LLM judge scores correctness:

  • Official Sentry docs are fetched as ground truth (URLs mapped in feature-docs.json)
  • GPT-4o evaluates the wizard's diff + new files against the docs on 4 criteria: feature-initialized, correct-imports, no-syntax-errors, follows-docs
  • Each criterion is scored pass/fail/unknown; the overall feature score must be >= 0.5

Platforms

6 platform templates are covered:

Platform Template SDK
Express express/ @sentry/node
Next.js nextjs/ @sentry/nextjs
SvelteKit sveltekit/ @sentry/sveltekit
React + Vite react-vite/ @sentry/react
Flask python-flask/ sentry-sdk
FastAPI python-fastapi/ sentry-sdk

Running

bun run test:init-eval          # all platforms

Requires SENTRY_AUTH_TOKEN, SENTRY_ORG, SENTRY_PROJECT, and optionally OPENAI_API_KEY (LLM judge is skipped without it).

Test Plan

  • All init unit tests pass (124 tests across 7 files)
  • bun run lint and bun run typecheck pass
  • CI passes (unit tests, e2e, lint, typecheck, build)
  • CI workflow for eval is tracked separately in Run init evals on CI #290

🤖 Generated with Claude Code

betegon and others added 21 commits February 17, 2026 20:48
Adds `sentry init` wizard that walks users through project setup via
the Mastra API, handling DSN configuration, SDK installation prompts,
and local file operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends tags and metadata (CLI version, OS, arch, node version) with
startAsync and resumeAsync calls so workflow runs are visible and
filterable in Mastra Studio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Import randomBytes and generate a hex trace ID so all
suspend/resume calls within a single wizard run share one trace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a synthetic parentSpanId to tracingOptions so all workflow run
spans become siblings under the same parent instead of nesting by
timestamp containment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parentSpanId was creating artificial nesting - let the workflow
engine handle span hierarchy naturally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Display the branded SENTRY ASCII banner before the intro line for visual
consistency with `sentry --help`. Make the "errors" feature always enabled
in the feature multi-select so users cannot deselect error monitoring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pt, and source maps hint

Route success-with-exitCode results to formatError so the --force hint
is shown when Sentry is already installed. Fold the "Error Monitoring is
always included" note into the multiselect prompt. Use a more approachable
Source Maps hint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show a non-blocking info note about AI usage with a docs link before
the first network call, and a review reminder before the success outro.
Extract SENTRY_DOCS_URL constant to share between wizard-runner and
clack-utils cancel message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add @anthropic-ai/sdk and openai as devDependencies for the LLM-as-judge
eval framework. Add opencode-lore dependency. Exclude test/init-eval/templates
from biome linting since they are fixture apps, not source code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add LLM-as-judge eval tests for the init wizard across all five
platforms (Express, Next.js, Flask, React+Vite, SvelteKit). Each test
runs the wizard end-to-end and asserts on SDK installation, Sentry.init
presence, build success, and documentation accuracy via an LLM judge.

Includes template apps, helper utilities (assertions, doc-fetcher,
judge, platform configs), and feature-docs.json mapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a separate workflow for running init-eval tests on demand. Supports
running a single platform or all platforms via matrix. Uses the init-eval
GitHub environment for MASTRA_API_URL and OPENAI_API_KEY secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store python-fastapi doc URLs as base paths (with trailing slash) like
other platforms, and convert to .md at fetch time. This mirrors the
pattern in cli-init-api and lets us return clean markdown directly
instead of stripping HTML tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for python-flask (getting-started, errors, tracing,
logs, profiling) and add the shared python/profiling page to both flask
and fastapi profiling entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for all nextjs features: getting-started, errors,
logs, tracing, session replay, metrics, and profiling (browser + node).
Sourcemaps left empty for now.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for sveltekit features and add missing logs,
metrics, and profiling features to the platform entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for react-vite features and add missing logs,
metrics, and profiling features to the platform entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flask eval was using bare `pip install` which fails when pip isn't on
PATH. Use the same venv pattern as fastapi. Also remove accidental
opencode-lore runtime dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

Trace

Other

  • (api) Add --data/-d flag and auto-detect JSON body in fields by BYK in #320
  • (formatters) Render all terminal output as markdown by BYK in #297
  • (init) Add init command for guided Sentry project setup by betegon in #283
  • (install) Add Sentry error telemetry to install script by BYK in #334
  • (issue-list) Global limit with fair distribution, compound cursor, and richer progress by BYK in #306
  • (log-list) Add --trace flag to filter logs by trace ID by BYK in #329
  • (project) Add project create command by betegon in #237
  • (upgrade) Add binary delta patching via TRDIFF10/bsdiff by BYK in #327

Bug Fixes 🐛

Api

  • Use numeric project ID to avoid "not actively selected" error by betegon in #312
  • Use limit param for issues endpoint page size by BYK in #309
  • Auto-correct ':' to '=' in --field values with a warning by BYK in #302

Formatters

  • Expand streaming table to fill terminal width by betegon in #314
  • Fix HTML entities and escaped underscores in table output by betegon in #313

Setup

  • Suppress agent skills and welcome messages on upgrade by BYK in #328
  • Suppress shell completion messages on upgrade by BYK in #326

Other

  • (ci) Generate JUnit XML to silence codecov-action warnings by BYK in #300
  • (install) Fix nightly digest extraction on macOS by BYK in #331
  • (nightly) Push to GHCR from artifacts dir so layer titles are bare filenames by BYK in #301
  • (project create) Auto-correct dot-separated platform to hyphens by BYK in #336
  • (region) Resolve DSN org prefix at resolution layer by BYK in #316
  • (test) Handle 0/-0 in getComparator anti-symmetry property test by BYK in #308
  • (trace-logs) Timestamp_precise is a number, not a string by BYK in #323

Internal Changes 🔧

Api

  • Upgrade @sentry/api to 0.21.0, remove raw HTTP pagination workarounds by BYK in #321
  • Wire listIssuesPaginated through @sentry/api SDK for type safety by BYK in #310

Other

  • (craft) Add sentry-release-registry target by BYK in #325

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Codecov Results 📊

100 passed | Total: 100 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric Change
Total Tests 📉 -2467
Passed Tests 📉 -2467
Failed Tests
Skipped Tests

All tests are passing successfully.

✅ Patch coverage is 98.13%. Project has 3193 uncovered lines.
✅ Project coverage is 82.85%. Comparing base (base) to head (head).

Files with missing lines (4)
File Patch % Lines
app.ts 81.74% ⚠️ 21 Missing
wizard-runner.ts 95.34% ⚠️ 11 Missing
local-ops.ts 98.11% ⚠️ 7 Missing
help.ts 96.81% ⚠️ 3 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.46%    82.85%    +1.39%
==========================================
  Files          125       133        +8
  Lines        17700     18617      +917
  Branches         0         0         —
==========================================
+ Hits         14418     15424     +1006
- Misses        3282      3193       -89
- Partials         0         0         —

Generated by Codecov Action

betegon and others added 3 commits February 23, 2026 22:16
Restrict GITHUB_TOKEN to contents:read as flagged by CodeQL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update SvelteKit template with working deps (adapter-node, latest
svelte/vite) and add required src files (app.d.ts, app.html). Use
python3 instead of python for venv creation in Flask/FastAPI platforms.
Add --concurrency 6 to init-eval test runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add push/pull_request triggers so the eval runs automatically alongside
other CI checks. Keep workflow_dispatch for manual single-platform runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 2 commits March 3, 2026 10:37
Move verbose inline command documentation from SKILL.md into per-command
reference files under references/. Update generate-skill and check-skill
scripts to support the new structure, and refactor agent-skills loader
to resolve reference paths at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Isolated tests now run without coverage merging, matching the pattern on
main. Deletes script/merge-lcov.sh, reverts test:isolated to a plain
bun test invocation, and removes the multi-step coverage merge from CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…code 1

The "handles missing suspend payload" test sets process.exitCode = 1 but
never resets it. When all isolated tests run in one process, the leaked
exitCode causes bun to exit 1 despite all 108 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 3 commits March 3, 2026 12:00
Extract formatBanner into src/lib/banner.ts to break the circular import
chain (wizard-runner → help → app → init → wizard-runner), enabling
init.test.ts to use spyOn instead of mock.module (which leaked across
test files causing 14 false failures). Add 15 unit tests for
wizard-runner.ts covering success, error, TTY check, dry-run, and all
suspend/resume paths — raising its coverage from 5.94% to 99.42%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix import ordering in wizard-runner.ts and init.test.ts, fix
noExportedImports/noBarrelFile lint by removing re-export from help.ts
and updating help.test.ts to import formatBanner from banner.js
directly. Fix formatting nit in mockImplementation call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Guard against empty multiselect options when only errorMonitoring is available
- Use purpose field for example detection in handleConfirm with string fallback
- Block glob (*, ?) and brace ({, }) expansion in shell metacharacter validation
- Improve metacharacter ordering docs and add Unix shell scope comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 3 commits March 3, 2026 13:46
Block `VAR=value cmd` patterns in validateCommand to prevent environment
variable injection (e.g. npm_config_registry=evil.com). Fix
extractSuspendPayload to return the actual step key found during fallback
iteration, so resumeAsync receives the correct step ID.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ry[] directly

Extract inline entry shape into a named DirEntry type and add
precomputeDirListing() that returns DirEntry[] instead of LocalOpResult,
so callers get the entries array directly without type assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ase caching (#307)

## Summary

Two optimizations to reduce round-trips during the init wizard:

1. **Pre-computed directory listing** — sends a pre-computed directory
listing with the first API call so the server can skip its initial
`list-dir` suspend. Saves one full HTTP round-trip in the
`discover-context` step.

2. **`_prevPhases` for cross-phase caching** — tracks per-step result
history (`stepHistory`) and sends `_prevPhases` with each resume
payload. This lets the server reuse results from earlier phases (e.g.
the `read-files` phase can reuse data from `analyze`) without
re-requesting them.

## Changes

- Exports `precomputeDirListing` from `local-ops.ts` — reuses the
existing `listDir` function with the same params the server would
request (recursive, maxDepth 3, maxEntries 500). The wizard runner calls
it before `startAsync` and includes the result as `dirListing` in
`inputData`.
- Adds a `stepHistory` map to track accumulated local-op results per
step. Each resume payload now includes `_prevPhases` containing results
from prior phases of the same step.

Companion server change: getsentry/cli-init-api#16

## Test plan

- [x] Init tests pass (`bun test test/lib/init/`)
- [x] Lint passes
- [ ] End-to-end with local dev server

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rst-token limitation, unexport FEATURE_INFO

- Add `#` to shell metacharacter blocklist to prevent command truncation
  (e.g. `npm install evil-pkg # @sentry/node`)
- Document Layer 3's first-token-only limitation with explanatory comment
- Remove unnecessary `export` from `FEATURE_INFO` in clack-utils.ts
- Add test for shell comment character blocking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…runtime assertions

- Add WizardOutput, SelectPayload, MultiSelectPayload, ConfirmPayload,
  SuspendPayload types; refactor InteractivePayload as discriminated union
- Remove ~20 unsafe casts across formatters, interactive, and wizard-runner
- Restructure runCommands to validate all commands before executing any
- Add assertWorkflowResult/assertSuspendPayload runtime validation for
  server responses
- Add tests for malformed responses, batch validation, and dry-run paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 2 commits March 3, 2026 21:00
- safePath() now resolves symlinks and rejects paths that escape the
  project directory via symlink (e.g. link → /etc)
- Add withTimeout() helper to race Mastra API calls against a deadline
- Bump API_TIMEOUT_MS to 120s to match DEFAULT_COMMAND_TIMEOUT_MS
- Delete duplicate isolated test file; consolidate tests in test/lib/
- Add JSDoc to safePath() and withTimeout()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or patchsets

Restore MASTRA_API_URL to production Cloudflare Worker with env var
override (was hardcoded to localhost:8787). Add upfront safePath()
validation in applyPatchset() so no files are written if any patch
targets an unsafe path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@betegon betegon requested review from BYK and MathurAditya724 March 3, 2026 20:20
}

return { ok: true, data: { applied } };
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry-run and real patchset diverge on unknown actions

Medium Severity

applyPatchsetDryRun records every patch as "applied" regardless of its action value, while the real applyPatchset silently skips unknown actions via default: break. This means a --dry-run will report patches as applied that would actually be silently dropped in a real run, producing misleading output. The two code paths need consistent handling of unrecognized patch actions.

Additional Locations (1)

Fix in Cursor Fix in Web

spin.stop("Done");
formatResult(result);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthiness-based exit code check is fragile

Low Severity

handleFinalResult computes hasError as result.status !== "success" || result.result?.exitCode. This relies on JavaScript truthiness where exit code 0 is falsy (treated as "no error"). While correct for integer exit codes, the variable isn't a boolean—it could be a number or undefined—making the intent non-obvious and fragile if the exitCode type ever includes other falsy values. An explicit comparison like exitCode !== undefined && exitCode !== 0 would be clearer.

Fix in Cursor Fix in Web

…llback (#333)

## Summary

Adds the `create-sentry-project` local operation so the remote workflow
can ask the CLI to create a Sentry project. Resolves the org via local
config / env vars first, falling back to listing orgs from the API
(auto-selects if only one, prompts interactively if multiple).

## Changes

- New `createSentryProject` handler in `local-ops.ts` with extracted
`resolveOrgSlug` helper that handles all org resolution paths (config,
single-org auto-select, multi-org interactive prompt, `--yes` guard)
- `CreateSentryProjectPayload` type added to `types.ts`
- Test suite covering success, single-org fallback, no-orgs, multi-org
`--yes`, interactive select, user cancel, API error, and missing DSN
paths
- Downstream mock setup extracted into `mockDownstreamSuccess` helper to
reduce test duplication

## Test Plan

```bash
bun test test/lib/init/local-ops.create-sentry-project.test.ts  # 8 pass
bun run lint  # clean
```

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

stderrChunks.push(chunk);
stderrLen += chunk.length;
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output buffer cap allows unbounded memory before truncation

Low Severity

In runSingleCommand, the stdout/stderr data handler stops pushing new chunks once the byte counter exceeds MAX_OUTPUT_BYTES, but any single chunk received just before the cap can be arbitrarily large. Since chunks are still pushed when stdoutLen < MAX_OUTPUT_BYTES regardless of chunk size, the total buffered data could significantly exceed the 64 KB cap. The final Buffer.concat then allocates a potentially oversized buffer before truncating via .slice().

Fix in Cursor Fix in Web

entry.name !== "node_modules"
) {
walk(path.join(dir, entry.name), depth + 1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory walk doesn't filter __pycache__ and venv directories

Low Severity

The listDir walk function only skips dot-directories and node_modules when recursing, but doesn't skip __pycache__, venv, .venv, or dist directories. Since this is also used for Python projects (Flask, FastAPI), the pre-computed directory listing sent to the API can be bloated with irrelevant files from virtual environments and bytecode caches, potentially hitting the maxEntries cap before cataloguing meaningful project files.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants