A multi-agent clinical trial design framework built on Claude Code. Pharma statisticians can invoke specialized AI agents to audit, simulate, and review trial designs -- then contribute new skills back via pull requests.
You provide a trial design proposal (YAML or PDF). The pipeline has two phases: design support agents build the first pass (assumptions, baseline computations, simulations), then auditor agents review the design from different perspectives. A final synthesizer merges everything into a decision-ready report.
FIRST PASS (Design Support)
===========================
Proposal ─► Assumption Retriever ─► Simulation Runner ─► Advanced Simulator
│
AUDITOR REVIEW │
============== │
┌─────────────────────────────────┤
▼ ▼ ▼
Conservative Bayesian Timeline
Auditor Auditor Auditor
│ │ │
└────────┬───────┘────────────────┘
▼
Design Synthesizer
│
▼
Final Report (.md + .html)
The user reviews and approves at each step (human-in-the-loop).
A complete worked example using the KEYNOTE-564 trial (pembrolizumab vs placebo, adjuvant RCC) is included in the repo:
- Example input:
templates/examples/keynote564.yaml— design assumptions for a Phase 3 adjuvant RCC trial (DFS primary, OS secondary, 95% power, Lan-DeMets OF spending) - Example synthesized report:
output/keynote564/synthesis_report.md— final 6-section synthesis with decision matrix, priority ranking, and recommended modifications - Step-by-step walkthrough:
docs/examples/walkthrough-oncology-phase3.md— detailed walkthrough of all 6 pipeline steps with actual results at each stage
The example pipeline produced 13 output files from the single YAML input, including
executable R scripts for every computation (rpact_design.R, gsdesign_design.R,
sensitivity_analysis.R, nph_simulation.R, crossover_simulation.R).
These agents build the initial design: gathering assumptions, computing the baseline, and stress-testing.
| Skill | Role | What It Does |
|---|---|---|
/assumption-retriever |
Evidence-based assumption benchmarker | Retrieves and benchmarks every numerical assumption against published trials. Produces GREEN/YELLOW/RED risk ranking by Impact Score. Feeds into simulation runner and auditors. |
/simulation-runner |
Computational statistician | Reproduces the design in rpact and gsDesign, cross-validates within 2% tolerance, generates sensitivity grids and spending comparisons. Produces the authoritative baseline for auditors. |
/advanced-simulator |
Simulation specialist | Stress-tests under non-proportional hazards (delayed effect), treatment crossover (RPSFT/IPCW), and co-primary endpoint correlation. Compares logrank, weighted logrank, MaxCombo, and RMST. |
These agents review the proposed design from different perspectives, each producing an independent critique.
| Skill | Persona | What It Does |
|---|---|---|
/auditor-conservative |
"The Gatekeeper" -- 30-year regulatory veteran | Evaluates sample size adequacy, regulatory risk, multiplicity complexity. Proposes a conservative alternative. Verdict: APPROVE / MODIFY / REVISE. |
/auditor-bayesian |
"The Bayesian" -- modernist borrowing advocate | Assesses historical data borrowing (RBesT, MAP priors), computes Bayesian assurance vs frequentist power. Recommendation: STRONG / MODERATE / WEAK. |
/auditor-timeline |
"The Accelerator" -- speed-focused strategist | Optimizes spending functions for early stopping, evaluates interim enrichment, gMCP alpha recycling, timeline compression. Quantifies months saved. |
| Skill | Role | What It Does |
|---|---|---|
/design-synthesizer |
Committee chair | Merges all reviews using escalation rules (evidence-backed findings can override consensus). Produces 6-section report with decision matrix. |
pharma_claude_skills/
├── .claude/skills/ # 7 agent skills + contributor template
│ ├── assumption-retriever/
│ ├── simulation-runner/
│ ├── advanced-simulator/
│ ├── auditor-conservative/
│ ├── auditor-bayesian/
│ ├── auditor-timeline/
│ ├── design-synthesizer/
│ └── _template/
├── mcp-server/ # TypeScript MCP server (rpact/gsDesign bridge)
│ └── src/
│ ├── index.ts # Entry point
│ ├── r-bridge/executor.ts # JSON → Rscript → JSON
│ └── tools/ # rpact, gsDesign, precompute, simulation tools
├── r-scripts/ # R computation modules
│ ├── design_calculations.R # rpact/gsDesign sample size, spending, sensitivity
│ ├── precompute.R # Sensitivity grids, power perturbations
│ ├── reference_designs.R # Regulatory precedent database
│ ├── pdf_parser.R # PDF text extraction
│ ├── html_report.R # Styled HTML with floating TOC
│ ├── json_io.R # MCP bridge dispatcher
│ └── install_packages.R # One-time R dependency installer
├── templates/
│ ├── proposal_template.yaml # Blank template for new proposals
│ └── examples/
│ └── keynote564.yaml # Filled example (KEYNOTE-564 adjuvant RCC)
├── designs/ # Reference design database (YAML)
├── output/ # Generated trial-specific folders
├── docs/
│ ├── CONTRIBUTING.md # How to add new skills
│ ├── ARCHITECTURE.md # System design and data flow
│ └── examples/
│ └── walkthrough-oncology-phase3.md
├── CLAUDE.md # Project memory for Claude Code
├── .mcp.json # MCP server registration
└── LICENSE # MIT
- Claude Code CLI installed
- R >= 4.2.0 with the following packages: rpact, gsDesign, simtrial, jsonlite, yaml, dplyr, tidyr, glue, survival, mvtnorm, knitr, pdftools, rmarkdown
- Node.js >= 18 (for the MCP server)
# 1. Clone the repo
git clone <repo-url> pharma_claude_skills
cd pharma_claude_skills
# 2. Install R packages
Rscript r-scripts/install_packages.R
# 3. Build the MCP server
cd mcp-server
npm install
npm run build
cd ..The MCP server is registered in .mcp.json and starts automatically when you open the project in Claude Code.
# Check R packages load
Rscript -e "library(rpact); library(gsDesign); cat('OK\n')"
# Check MCP server starts
node mcp-server/dist/index.js
# (should start without errors; Ctrl+C to stop)# Open the project in Claude Code
claude
# Run each skill step-by-step:
/assumption-retriever "adjuvant RCC DFS Phase 3"
# → Review assumptions_audit.json
/simulation-runner templates/examples/keynote564.yaml
# → Review design_baseline.json
/advanced-simulator output/keynote564/
# → Review simulation_results.json
/auditor-conservative output/keynote564/
/auditor-bayesian output/keynote564/
/auditor-timeline output/keynote564/
# → Review all three auditor outputs
/design-synthesizer output/keynote564/
# → Final synthesis_report.md + .html
-
Copy the template:
cp templates/proposal_template.yaml templates/examples/my_trial.yaml -
Fill in your design parameters -- study identification, endpoints, sample size, interim analyses, assumptions, and review questions. See
templates/examples/keynote564.yamlfor a complete example. -
Run the pipeline using the skills listed above, reviewing output at each step.
If you have a protocol PDF instead of a YAML file, the /simulation-runner and /assumption-retriever skills can work with PDF input. The pdf_parser.R module extracts text and the agents will perform parameter extraction as a first phase.
All intermediate and final artifacts are saved to output/<trial_name>/:
| File | Producer | Description |
|---|---|---|
assumptions_audit.json |
assumption-retriever | Risk-ranked assumptions with GREEN/YELLOW/RED flags |
design_baseline.json |
simulation-runner | Authoritative rpact/gsDesign cross-validated baseline |
simulation_results.json |
advanced-simulator | NPH, crossover, co-primary power tables |
review_conservative.json |
auditor-conservative | Regulatory risk matrix with verdict |
review_bayesian.json |
auditor-bayesian | Prior analysis, borrowing recommendation |
review_timeline.json |
auditor-timeline | Spending optimization, time savings |
synthesis_report.md |
design-synthesizer | Final integrated report (6 sections + decision matrix) |
synthesis_report.html |
design-synthesizer | Styled HTML with floating table of contents |
Skills are markdown files (.claude/skills/*/SKILL.md) containing system prompts that define each agent's persona, analytical workflow, and output schema. Claude Code loads them and follows the instructions when you invoke /skill-name.
MCP Server is a TypeScript process that exposes R computation as tools Claude can call. When a skill needs to run rpact::getSampleSizeSurvival(), it calls the MCP tool, which writes JSON to a temp file, runs Rscript, and returns the result.
R Scripts contain the actual statistical computation -- sample size calculations, sensitivity grids, spending function comparisons, PDF parsing, and HTML report generation.
The design-synthesizer uses evidence-based escalation rules so that simulation findings aren't diluted by simple vote-counting:
| Priority | Trigger |
|---|---|
| CRITICAL | 3+ agents agree, OR simulation power loss >10pp, OR evidence-adjusted power <75% |
| HIGH | 2+ agents agree, OR simulation power loss 5-10pp, OR RED assumption with 3+ contradicting sources |
| MEDIUM | 1 agent with strong justification, OR YELLOW assumption in top 5 |
| LOW | 1 agent, minor impact |
We welcome contributions from pharma statisticians. The simplest contribution is a new skill:
- Fork the repo
- Copy
.claude/skills/_template/to.claude/skills/<your-skill-name>/ - Write your
SKILL.mdwith YAML frontmatter + system prompt + output schema - Submit a PR with the skill file and at least one example
See docs/CONTRIBUTING.md for full guidelines and docs/ARCHITECTURE.md for system design details.
- Adaptive enrichment designer -- biomarker-driven enrichment with sample size re-estimation
- Dose-finding reviewer -- evaluate Phase 1/2 dose-escalation designs (CRM, BOIN)
- Safety monitoring planner -- DSMB charter design, safety stopping rules
- Regulatory strategy advisor -- map design to FDA/EMA guidance, recommend meeting strategy
- Cost-effectiveness modeler -- expected trial cost under different design alternatives
MIT -- see LICENSE.