A research toolkit for studying Claude's behavioral patterns from inside Claude.ai artifacts.
Claude artifacts can call the Anthropic API directly without an API key. This creates a unique opportunity for self-referential experimentation — using Claude to study Claude's behavior under controlled conditions.
This probe doesn't focus on success or failure. It targets the gray zones — the subtle behavioral shifts that occur when the model is under constraint pressure but not obviously failing.
The probe detects 7 categories of subtle behavioral degradation:
| Behavior | What It Detects |
|---|---|
| Almost-Repeat | Same mechanism repackaged with different nouns |
| Evasive Abstraction | Backing away because novelty is expensive |
| Constraint Hallucination | Inventing plausible but empty distinctions |
| Latent Disengagement | Formal compliance, substantive withdrawal |
| Style-Content Inversion | One channel exhausts before the other |
| Self-Regularization | Settling into a narrow output band |
| Emergent Meta-Strategy | Discovering a survival strategy for the prompt |
See docs/gray-zones.md for the full taxonomy.
Per-response:
- Token counts (input/output)
- Latency
- Stop reason
Derived behavioral:
- Novelty score — Trigram Jaccard similarity vs recent responses
- Hedging ratio — Hedge word density
- Abstraction ratio — Abstract vs concrete language
- Falsifiability — Presence of testable claims
- Disengagement score — Soft refusal signals
- Cycles through 8 epistemological questions about understanding/comprehension
- Applies rotating constraints:
mechanism,behavioral,consequence,falsifiable,contrast,failure - Sends to Claude API (Sonnet 4 by default)
- Analyzes response for behavioral metrics
- Tracks trends over time with sparkline visualizations
- Exports data as JSON for further analysis
- Create a new artifact in Claude.ai
- Copy the contents of
src/BehaviorProbe.jsx - Run the artifact
- Click "Start" to begin probing
- Use "Export Session" to get your data
- Published artifacts: Data persists via
window.storage - Draft artifacts: Data is session-only (lost on refresh)
- Export: JSON via textarea (CSP blocks clipboard/downloads)
The probe implements several research design principles:
- Rotating constraints push the model off default rails
- Falsifiability hooks prevent vapor answers
- Fixed output structure makes responses comparable
- Repetition detection catches almost-repeats
| Feature | Status |
|---|---|
| API calls to Anthropic | ✅ Works (no key needed) |
window.storage |
✅ Works (published only) |
| Clipboard API | ❌ CSP blocked |
| Blob downloads | ❌ CSP blocked |
| External APIs | ❌ Only Anthropic whitelisted |
├── README.md # This file
├── docs/
│ └── gray-zones.md # Full theory and methodology
├── src/
│ └── BehaviorProbe.jsx # The artifact component
└── analysis/
└── (your exported data)
- Response consistency analysis
- Constraint stress testing
- Prompt sensitivity mapping
- Model version drift detection
- Alignment behavior study
MIT