Skip to content

Add app-development benchmark config and CLI integration#9

Merged
mongodben merged 2 commits intomainfrom
EAI-1611
Mar 18, 2026
Merged

Add app-development benchmark config and CLI integration#9
mongodben merged 2 commits intomainfrom
EAI-1611

Conversation

@mongodben
Copy link
Copy Markdown
Collaborator

Jira: https://jira.mongodb.org/browse/EAI-1611

Changes

  • Add benchmark config for app_development and register it in the benchmark CLI
  • Load 104 eval cases from datasets/app-development.yml with dataset splits: all, mongodb_optimal, db_agnostic
  • Add system prompt variants in prompts.ts: none, generic_coding_assistant, mongodb_recommended, system_architect, stack_agnostic
  • Each prompt variant registers as a separate task in the CLI (e.g. simple_prompt_completion, prompt_system_architect)
  • Wire subject model through Braintrust proxy, judge model uses gpt-5.4

Notes

  • Subject model uses .chat() through Braintrust proxy since .responses() has translation issues for non-OpenAI providers (Claude, Gemini)
  • generic_coding_assistant prompt uses "production-ready" language to encourage models to include a real database
  • system_architect variant focuses on design reasoning over code, giving classifiers more signal to analyze

Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ai-benchmarks Error Error Mar 18, 2026 8:30pm

Request Review

@mongodben mongodben merged commit b3f241b into main Mar 18, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant