sourcegraph
diff --git a/‎README.md‎
Lines changed: 54 additions & 54 deletions b/‎README.md‎
Lines changed: 54 additions & 54 deletions
@@ -75,37 +75,37 @@ Nine suites organized by software development lifecycle phase:
 
 | Suite | SDLC Phase | Tasks | Description |
 |-------|-----------|------:|-------------|
-| `csb_sdlc_fix` | Bug Repair | 26 | Diagnosing and fixing real bugs across production codebases |
 | `csb_sdlc_feature` | Feature Implementation | 23 | New features, interface implementation, big-code features |
-| `csb_sdlc_debug` | Debugging & Investigation | 18 | Root cause tracing, fault localization, provenance |
-| `csb_sdlc_test` | Testing & QA | 18 | Code review, performance testing, code search validation, test generation |
-| `csb_sdlc_refactor` | Cross-File Refactoring | 16 | Cross-file refactoring, enterprise dependency refactoring, rename refactoring |
-| `csb_sdlc_design` | Architecture & Design | 14 | Architecture analysis, dependency graphs, change impact |
-| `csb_sdlc_document` | Documentation | 13 | API references, architecture docs, migration guides, runbooks |
-| `csb_sdlc_secure` | Security & Compliance | 12 | CVE analysis, reachability, governance, access control |
-| `csb_sdlc_understand` | Requirements & Discovery | 10 | Codebase comprehension, onboarding, Q&A, knowledge recovery |
-| **Total** | | **150** | |
+| `csb_sdlc_fix` | Bug Repair | 19 | Diagnosing and fixing real bugs across production codebases |
+| `csb_sdlc_refactor` | Cross-File Refactoring | 18 | Cross-file refactoring, enterprise dependency refactoring, rename refactoring |
+| `csb_sdlc_debug` | Debugging & Investigation | 13 | Root cause tracing, fault localization, provenance |
+| `csb_sdlc_secure` | Security & Compliance | 13 | CVE analysis, reachability, governance, access control |
+| `csb_sdlc_test` | Testing & QA | 12 | Code review, performance testing, code search validation, test generation |
+| `csb_sdlc_design` | Architecture & Design | 11 | Architecture analysis, dependency graphs, change impact |
+| `csb_sdlc_document` | Documentation | 11 | API references, architecture docs, migration guides, runbooks |
+| `csb_sdlc_understand` | Requirements & Discovery | 11 | Codebase comprehension, onboarding, Q&A, knowledge recovery |
+| **Total** | | **131** | |
 
 ## CodeScaleBench-Org
 
 Eleven additional suites measure cross-repo discovery, symbol resolution, dependency tracing, and deep-search-driven investigation in polyrepo environments.
 
 | Suite | Category | Tasks | Description |
 |-------|----------|------:|-------------|
-| `csb_org_onboarding` | Onboarding & Comprehension | 28 | API consumption mapping, end-to-end flow, architecture maps |
-| `csb_org_migration` | Framework Migration | 26 | API migrations, breaking changes across repos |
-| `csb_org_security` | Vulnerability Remediation | 24 | CVE mapping, missing auth middleware across repos |
-| `csb_org_crossrepo_tracing` | Dependency Tracing | 22 | Cross-repo dependency chains, blast radius, symbol resolution |
-| `csb_org_domain` | Domain Lineage | 20 | Config propagation, architecture patterns, domain analysis |
-| `csb_org_incident` | Incident Debugging | 20 | Error-to-code-path tracing across microservices |
-| `csb_org_compliance` | Compliance | 18 | Standards adherence, audit, and provenance workflows |
-| `csb_org_platform` | Platform Knowledge | 18 | Service template discovery and tribal knowledge |
-| `csb_org_crossorg` | Cross-Org Discovery | 15 | Interface implementations and authoritative repo identification across orgs |
-| `csb_org_org` | Organizational Context | 15 | Agentic discovery, org-wide coding correctness |
-| `csb_org_crossrepo` | Cross-Repo Discovery | 14 | Cross-repo search, dependency discovery, impact analysis |
-| **Total** | | **220** | |
-
-**Combined canonical benchmark: 370 tasks** (150 SDLC across 9 suites + 220 Org across 11 suites). Suite sizes are DOE-driven (Neyman-optimal allocation) to maximize statistical power per suite rather than uniform 20-task sizing. An additional 28 backup tasks are archived in `benchmarks/backups/`.
+| `csb_org_migration` | Framework Migration | 25 | API migrations, breaking changes across repos |
+| `csb_org_compliance` | Compliance | 13 | Standards adherence, audit, and provenance workflows |
+| `csb_org_incident` | Incident Debugging | 13 | Error-to-code-path tracing across microservices |
+| `csb_org_platform` | Platform Knowledge | 13 | Service template discovery and tribal knowledge |
+| `csb_org_security` | Vulnerability Remediation | 13 | CVE mapping, missing auth middleware across repos |
+| `csb_org_crossorg` | Cross-Org Discovery | 12 | Interface implementations and authoritative repo identification across orgs |
+| `csb_org_crossrepo` | Cross-Repo Discovery | 11 | Cross-repo search, dependency discovery, impact analysis |
+| `csb_org_crossrepo_tracing` | Dependency Tracing | 11 | Cross-repo dependency chains, blast radius, symbol resolution |
+| `csb_org_domain` | Domain Lineage | 11 | Domain-specific lineage and analysis workflows |
+| `csb_org_onboarding` | Onboarding & Comprehension | 11 | API consumption mapping, end-to-end flow, architecture maps |
+| `csb_org_org` | Organizational Context | 11 | Agentic discovery, org-wide coding correctness |
+| **Total** | | **144** | |
+
+**Combined canonical benchmark: 275 tasks** (131 SDLC across 9 suites + 144 Org across 11 suites). Suite sizes are DOE-driven (Neyman-optimal allocation) to maximize statistical power per suite rather than uniform sizing. Non-canonical tasks are archived in `benchmarks/backups/`.
 
 Both baseline and MCP-Full agents have access to **all repos** in each task's fixture. The only difference is the method: baseline reads code locally, MCP-Full uses Sourcegraph MCP tools (local code is truncated). This ensures we measure whether MCP tools help agents work better — not whether MCP can access repos the baseline can't.
 
@@ -135,27 +135,27 @@ See [docs/reference/CONFIGS.md](docs/reference/CONFIGS.md) for the canonical con
 
 ```
 benchmarks/              # Task definitions organized by SDLC phase + Org
-  csb_sdlc_fix/          #   Bug Repair (26 tasks)
   csb_sdlc_feature/      #   Feature Implementation (23 tasks)
-  csb_sdlc_debug/        #   Debugging & Investigation (18 tasks)
-  csb_sdlc_test/         #   Testing & QA (18 tasks)
-  csb_sdlc_refactor/     #   Cross-File Refactoring (16 tasks)
-  csb_sdlc_design/       #   Architecture & Design (14 tasks)
-  csb_sdlc_document/     #   Documentation (13 tasks)
-  csb_sdlc_secure/       #   Security & Compliance (12 tasks)
-  csb_sdlc_understand/   #   Requirements & Discovery (10 tasks)
-  backups/               #   Archived backup tasks (28 total)
-  csb_org_onboarding/    #   Org: onboarding (28 tasks)
-  csb_org_migration/     #   Org: framework migration (26 tasks)
-  csb_org_security/      #   Org: vulnerability remediation (24 tasks)
-  csb_org_crossrepo_tracing/  #   Org: dependency tracing (22 tasks)
-  csb_org_domain/        #   Org: domain lineage (20 tasks)
-  csb_org_incident/      #   Org: incident debugging (20 tasks)
-  csb_org_compliance/    #   Org: compliance & audit (18 tasks)
-  csb_org_platform/      #   Org: platform knowledge (18 tasks)
-  csb_org_crossorg/      #   Org: cross-org discovery (15 tasks)
-  csb_org_org/           #   Org: org context (15 tasks)
-  csb_org_crossrepo/     #   Org: cross-repo discovery (14 tasks)
+  csb_sdlc_fix/          #   Bug Repair (19 tasks)
+  csb_sdlc_refactor/     #   Cross-File Refactoring (18 tasks)
+  csb_sdlc_debug/        #   Debugging & Investigation (13 tasks)
+  csb_sdlc_secure/       #   Security & Compliance (13 tasks)
+  csb_sdlc_test/         #   Testing & QA (12 tasks)
+  csb_sdlc_design/       #   Architecture & Design (11 tasks)
+  csb_sdlc_document/     #   Documentation (11 tasks)
+  csb_sdlc_understand/   #   Requirements & Discovery (11 tasks)
+  csb_org_migration/     #   Org: framework migration (25 tasks)
+  csb_org_compliance/    #   Org: compliance & audit (13 tasks)
+  csb_org_incident/      #   Org: incident debugging (13 tasks)
+  csb_org_platform/      #   Org: platform knowledge (13 tasks)
+  csb_org_security/      #   Org: vulnerability remediation (13 tasks)
+  csb_org_crossorg/      #   Org: cross-org discovery (12 tasks)
+  csb_org_crossrepo/     #   Org: cross-repo discovery (11 tasks)
+  csb_org_crossrepo_tracing/  #   Org: dependency tracing (11 tasks)
+  csb_org_domain/        #   Org: domain lineage (11 tasks)
+  csb_org_onboarding/    #   Org: onboarding (11 tasks)
+  csb_org_org/           #   Org: org context (11 tasks)
+  backups/               #   Archived non-canonical tasks
 configs/                 # Run configs and task selection
   _common.sh             #   Shared infra: token refresh, parallel execution, multi-account
   sdlc_suite_2config.sh  #   Generic SDLC runner (used by phase wrappers below)
@@ -169,7 +169,7 @@ configs/                 # Run configs and task selection
   test_2config.sh        #   Phase wrapper: Test (20 tasks)
   run_selected_tasks.sh  #   Unified runner for all tasks
   validate_one_per_benchmark.sh  # Pre-flight smoke (1 task per suite)
-  selected_benchmark_tasks.json  # Canonical task selection: 370 tasks (150 SDLC + 220 Org)
+  selected_benchmark_tasks.json  # Canonical task selection: 275 tasks (131 SDLC + 144 Org)
   use_case_registry.json #   100 GTM use cases (Org task source)
   archive/               #   Pre-SDLC migration scripts (preserved for history)
 scripts/                 # Metrics extraction, evaluation, and operational tooling
@@ -293,10 +293,10 @@ This section assumes Harbor is already installed and configured. If not, start w
 
 ### SDLC Tasks
 
-The unified runner executes all 370 canonical tasks across the 2-config matrix:
+The unified runner executes all 275 canonical tasks across the 2-config matrix:
 
 ```bash
-# Run all 370 tasks across 2 configs
+# Run all 275 tasks across 2 configs
 bash configs/run_selected_tasks.sh
 
 # Run only the baseline config
@@ -312,15 +312,15 @@ bash configs/run_selected_tasks.sh --dry-run
 Per-phase runners are also available:
 
 ```bash
-bash configs/fix_2config.sh              # 26 Bug Repair tasks
 bash configs/feature_2config.sh          # 23 Feature Implementation tasks
-bash configs/debug_2config.sh            # 18 Debugging & Investigation tasks
-bash configs/test_2config.sh             # 18 Testing & QA tasks
-bash configs/refactor_2config.sh         # 16 Cross-File Refactoring tasks
-bash configs/design_2config.sh           # 14 Architecture & Design tasks
-bash configs/document_2config.sh         # 13 Documentation tasks
-bash configs/secure_2config.sh           # 12 Security & Compliance tasks
-bash configs/understand_2config.sh       # 10 Requirements & Discovery tasks
+bash configs/fix_2config.sh              # 19 Bug Repair tasks
+bash configs/refactor_2config.sh         # 18 Cross-File Refactoring tasks
+bash configs/debug_2config.sh            # 13 Debugging & Investigation tasks
+bash configs/secure_2config.sh           # 13 Security & Compliance tasks
+bash configs/test_2config.sh             # 12 Testing & QA tasks
+bash configs/design_2config.sh           # 11 Architecture & Design tasks
+bash configs/document_2config.sh         # 11 Documentation tasks
+bash configs/understand_2config.sh       # 11 Requirements & Discovery tasks
 ```
 
 ### Filtering by Suite