Skip to content

Add GPT 5.4 (xhigh) eval results#71

Open
gaojude wants to merge 1 commit intomainfrom
jude/add-gpt-5.4-xhigh
Open

Add GPT 5.4 (xhigh) eval results#71
gaojude wants to merge 1 commit intomainfrom
jude/add-gpt-5.4-xhigh

Conversation

@gaojude
Copy link
Contributor

@gaojude gaojude commented Mar 21, 2026

Add eval results for GPT 5.4 with reasoningEffort=xhigh running through the Codex harness via vercel-ai-gateway/codex. Includes both the base run and the AGENTS.md variant.

The base run passes 20/21 evals (95%), failing only agent-030-app-router-migration-hard. With AGENTS.md, the model picks up agent-029-use-cache-directive and agent-040-unstable-instant, bringing the docs-assisted rate from 86% to 95% — a +10 delta.

Also adds vercel-ai-gateway/codex to HARNESS_NAMES in export-results.ts and maps the new experiment names in MODEL_NAMES and the docs impact pairing config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant