Add GPT 5.4 (xhigh) eval results by gaojude · Pull Request #71 · vercel/next-evals-oss

gaojude · 2026-03-21T19:56:28Z

Add eval results for GPT 5.4 with reasoningEffort=xhigh running through the Codex harness via vercel-ai-gateway/codex. Includes both the base run and the AGENTS.md variant.

The base run passes 20/21 evals (95%), failing only agent-030-app-router-migration-hard. With AGENTS.md, the model picks up agent-029-use-cache-directive and agent-040-unstable-instant, bringing the docs-assisted rate from 86% to 95% — a +10 delta.

Also adds vercel-ai-gateway/codex to HARNESS_NAMES in export-results.ts and maps the new experiment names in MODEL_NAMES and the docs impact pairing config.

Add GPT 5.4 (xhigh) eval results

ac64135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT 5.4 (xhigh) eval results#71

Add GPT 5.4 (xhigh) eval results#71
gaojude wants to merge 1 commit intomainfrom
jude/add-gpt-5.4-xhigh

gaojude commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaojude commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant