Commit d3b3370
committed
docs: nightly research report 2026-03-09
Covers oracle/task data integrity, judge schema compliance, metrics
extraction edge cases, and config script consistency. Key findings:
- 66 org tasks missing oracle text field (46% of org suite)
- Judge output violates its own JSON schema (additionalProperties: false)
- Set comparison bug in ir_metrics.py overestimates tt_all_r timing
- 13 malformed task.toml files across SDLC suites
- run_selected_tasks.sh runs 11 excluded tasks unintentionally1 parent 1e50565 commit d3b3370
1 file changed
+488
-0
lines changed
0 commit comments