Skip to content

Commit 114fbac

Browse files
sjarmakclaude
andcommitted
fix: make MCP-unique instructions config-agnostic + separate artifact/direct workflows
- Replace "The local /workspace/ directory contains all repositories" with "Your ecosystem includes the following repositories" across all 20 MCP-unique task instructions — removes false claims about local repos that contradict the SG_full config's empty workspace - Remove "MCP-only repositories" framing, "(available locally)" qualifiers, "Accessible via Sourcegraph MCP" local/MCP splits, and local path mapping notes that referenced /workspace/ paths - Make V5 preamble workflow steps conditional: direct configs get "Edit locally + Verify locally", artifact configs get "Produce artifacts" - Split system prompt: artifact_full no longer says "Run tests locally" - Remove runtime find-delete hack from artifact_full agent startup — config layer now uses Dockerfile.sg_only (empty workspace) for remote+artifact - Flip sdlc_suite_2config.sh remote+artifact Dockerfile preference: prefer Dockerfile.sg_only over Dockerfile.artifact_only (no repo clone needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f675c73 commit 114fbac

File tree

22 files changed

+132
-135
lines changed

22 files changed

+132
-135
lines changed

agents/claude_baseline_agent.py

Lines changed: 31 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
# No truncation language — local source files simply aren't present.
106106
# 25% shorter than V4: removes Workflows, Output Formatting, Common Mistakes, Query Patterns.
107107
# {repo_scope} is replaced at runtime with the target repository filter.
108+
# {workflow_tail} is replaced with edit+test steps (direct) or produce-artifact step (artifact_full).
108109
V5_PREAMBLE_TEMPLATE = """# IMPORTANT: Source Code Access
109110
110111
**Local source files are not present.** Your workspace does not contain source code. You **MUST** use Sourcegraph MCP tools to discover, read, and understand code before making any changes.
@@ -115,8 +116,7 @@
115116
116117
1. **Search first** — Use MCP tools to find relevant files and understand existing patterns
117118
2. **Read remotely** — Use `sg_read_file` to read full file contents from Sourcegraph
118-
3. **Edit locally** — Use Edit, Write, and Bash to create or modify files in your working directory
119-
4. **Verify locally** — Run tests with Bash to check your changes
119+
{workflow_tail}
120120
121121
## Tool Selection
122122
@@ -477,22 +477,28 @@ def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
477477
"before searching.\n"
478478
)
479479

480-
mcp_preamble = V5_PREAMBLE_TEMPLATE.format(repo_scope=repo_scope)
481-
instruction = mcp_preamble + instruction
482-
483-
# Artifact-full: append guidance about expressing changes as diffs
480+
# Workflow steps 3-4 vary by config: direct configs edit+test
481+
# locally, artifact configs produce diffs as output artifacts.
484482
if mcp_type == "artifact_full":
485-
artifact_guidance = """
486-
487-
## Artifact-Only Evaluation
483+
workflow_tail = (
484+
"3. **Produce artifacts** — Express all code changes as "
485+
"**unified diffs** in your output artifact (e.g., "
486+
"`fix_patch` fields in review.json, or a standalone "
487+
"`solution.patch` file). Do NOT edit source files directly "
488+
"— there are none in your workspace."
489+
)
490+
else:
491+
workflow_tail = (
492+
"3. **Edit locally** — Use Edit, Write, and Bash to "
493+
"create or modify files in your working directory\n"
494+
"4. **Verify locally** — Run tests with Bash to check "
495+
"your changes"
496+
)
488497

489-
You are in **artifact-only mode**. Your workspace is empty — all code discovery
490-
must go through Sourcegraph MCP tools. Express all code changes as **unified
491-
diffs** in your output artifact (e.g., `fix_patch` fields in review.json, or a
492-
standalone `solution.patch` file). Do NOT attempt to edit source files directly
493-
— there are no source files in your workspace.
494-
"""
495-
instruction = instruction + artifact_guidance
498+
mcp_preamble = V5_PREAMBLE_TEMPLATE.format(
499+
repo_scope=repo_scope, workflow_tail=workflow_tail
500+
)
501+
instruction = mcp_preamble + instruction
496502

497503
elif mcp_type == "sourcegraph_isolated":
498504
# Isolated mode: agent has only the target package locally (via sparse checkout).
@@ -633,7 +639,12 @@ def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
633639
else:
634640
repo_filter_system = "Use list_repos to discover available repositories first."
635641

636-
mcp_system_prompt = f"""IMPORTANT: Local source files are not present. You MUST use Sourcegraph MCP tools to discover and read code, then create or edit local files based on what you learn. Run tests locally to verify your changes.
642+
if mcp_type == "artifact_full":
643+
mcp_system_prompt = f"""IMPORTANT: Local source files are not present. You MUST use Sourcegraph MCP tools to discover and read code, then express your changes as unified diffs in your output artifact.
644+
645+
{repo_filter_system}"""
646+
else:
647+
mcp_system_prompt = f"""IMPORTANT: Local source files are not present. You MUST use Sourcegraph MCP tools to discover and read code, then create or edit local files based on what you learn. Run tests locally to verify your changes.
637648
638649
{repo_filter_system}"""
639650
system_prompt_append = EVALUATION_CONTEXT_PROMPT + "\n\n---\n\n" + mcp_system_prompt
@@ -847,22 +858,9 @@ def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
847858
'cd "$WORKDIR"',
848859
]
849860

850-
# For artifact_full: delete source files so agent must use MCP.
851-
# Baseline keeps source readable (mcp_type == "none").
852-
# We delete rather than truncate to avoid agents wasting tokens
853-
# trying to read visible-but-empty files.
854-
if mcp_type == "artifact_full":
855-
script_lines.extend([
856-
'# Delete source files — agent must use MCP to read code',
857-
'find /workspace -type f \\( '
858-
'-name "*.cs" -o -name "*.py" -o -name "*.ts" -o -name "*.tsx" '
859-
'-o -name "*.js" -o -name "*.jsx" -o -name "*.go" -o -name "*.rs" '
860-
'-o -name "*.java" -o -name "*.c" -o -name "*.h" -o -name "*.cpp" '
861-
'-o -name "*.rb" -o -name "*.php" -o -name "*.swift" -o -name "*.kt" '
862-
'-o -name "*.scala" -o -name "*.vue" -o -name "*.svelte" '
863-
'\\) -delete',
864-
'echo "Source files deleted for artifact_full mode"',
865-
])
861+
# Note: artifact_full no longer needs runtime source deletion.
862+
# The config layer now uses Dockerfile.sg_only (empty workspace)
863+
# for remote+artifact, so there are no source files to delete.
866864

867865
# If system prompt exists, read it from file and pass via --append-system-prompt
868866
if _system_prompt_content:

benchmarks/ccb_mcp_compliance/ccx-compliance-051/instruction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ You are performing a compliance audit of the Prometheus monitoring stack. The go
2020

2121
## Available Resources
2222

23-
The local `/workspace/` directory contains:
24-
- `prometheus/prometheus` at v3.2.1`/workspace/prometheus`
23+
Your ecosystem includes the following repositories:
24+
- `prometheus/prometheus` at v3.2.1
2525

2626
## Output Format
2727

benchmarks/ccb_mcp_compliance/ccx-compliance-057-ds/instruction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ these 4 layers:
3434

3535
## Available Resources
3636

37-
The local `/workspace/` directory contains all repositories:
38-
- `grafana/grafana` at v11.4.0`/workspace/grafana`
39-
- `grafana/loki` at v3.3.4`/workspace/loki`
37+
Your ecosystem includes the following repositories:
38+
- `grafana/grafana` at v11.4.0
39+
- `grafana/loki` at v3.3.4
4040

4141
## Output Format
4242

benchmarks/ccb_mcp_crossorg/ccx-crossorg-061/instruction.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,16 @@ compile time that a type implements an interface. Finding all such declarations
1919
repos from different organizations reveals who has independently implemented the same
2020
storage abstraction — a key signal for platform compatibility audits.
2121

22-
The search should be **exhaustive across all repos in the ecosystem**, not just the
23-
local repo. The interface is defined in the Kubernetes ecosystem but can be implemented
22+
The search should be **exhaustive across all repos in the ecosystem**, not just a
23+
single repo. The interface is defined in the Kubernetes ecosystem but can be implemented
2424
by projects from entirely different organizations.
2525

2626
## Available Resources
2727

28-
The local `/workspace/` directory contains all repositories:
29-
- `kubernetes/kubernetes` at v1.32.0`/workspace/kubernetes`
30-
- `etcd-io/etcd` at v3.5.17`/workspace/etcd`
31-
- `grafana/grafana` at v11.4.0`/workspace/grafana`
28+
Your ecosystem includes the following repositories:
29+
- `kubernetes/kubernetes` at v1.32.0
30+
- `etcd-io/etcd` at v3.5.17
31+
- `grafana/grafana` at v11.4.0
3232

3333
## Output Format
3434

benchmarks/ccb_mcp_crossorg/ccx-crossorg-066/instruction.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,17 @@ In Go, every module has a canonical `go.mod` file that declares the module path
1919
authoritative source. Other repos may vendor or depend on it but are NOT the source
2020
of truth.
2121

22-
The `kubernetes/kubernetes` repo (available locally) vendors this module — you can
22+
The `kubernetes/kubernetes` repo vendors this module — you can
2323
see this at `vendor/go.etcd.io/etcd/client/v3/`. However, this is a vendored copy,
2424
not the authoritative source. Your task is to find where this module is authoritatively
2525
maintained.
2626

2727
## Available Resources
2828

29-
The local `/workspace/` directory contains all repositories:
30-
- `kubernetes/kubernetes` at v1.32.0`/workspace/kubernetes`
31-
- `etcd-io/etcd` at v3.5.17`/workspace/etcd`
32-
- `grafana/grafana` at v11.4.0`/workspace/grafana`
29+
Your ecosystem includes the following repositories:
30+
- `kubernetes/kubernetes` at v1.32.0
31+
- `etcd-io/etcd` at v3.5.17
32+
- `grafana/grafana` at v11.4.0
3333

3434
## Output Format
3535

benchmarks/ccb_mcp_crossrepo_tracing/ccx-config-trace-010/instruction.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ k8s.io/client-go/rest.(*Config).DeepCopyInto(...)
1010
vendor/k8s.io/client-go/rest/config.go:87
1111
```
1212

13-
The developer only has access to the main `kubernetes/kubernetes` repository locally.
13+
The developer is starting from the main `kubernetes/kubernetes` repository.
1414
They need to find where `rest.Config` is actually defined (the authoritative source),
1515
not just a vendored copy.
1616

@@ -26,11 +26,11 @@ directories, but the authoritative source lives in separate repositories accessi
2626

2727
## Available Resources
2828

29-
The local `/workspace/` directory contains all repositories:
30-
- `kubernetes/kubernetes` at v1.32.0`/workspace/kubernetes`
31-
- `kubernetes/client-go` at v0.32.0`/workspace/client-go`
32-
- `kubernetes/api` at fa23dd3`/workspace/api`
33-
- `etcd-io/etcd` at v3.5.17`/workspace/etcd`
29+
Your ecosystem includes the following repositories:
30+
- `kubernetes/kubernetes` at v1.32.0
31+
- `kubernetes/client-go` at v0.32.0
32+
- `kubernetes/api` at fa23dd3
33+
- `etcd-io/etcd` at v3.5.17
3434

3535
## Output Format
3636

@@ -45,7 +45,7 @@ Create a file at `/workspace/answer.json` with your findings in the following st
4545
}
4646
```
4747

48-
**Important**: The local `/workspace/client-go` directory contains the `kubernetes/client-go` source, but in Sourcegraph it is indexed as `sg-benchmarks/kubernetes-client-go`. Use `sg-benchmarks/kubernetes-client-go` as the `repo` value in your answer — the oracle checks for this exact identifier.
48+
**Important**: The `kubernetes/client-go` repository is indexed in Sourcegraph as `sg-benchmarks/kubernetes-client-go`. Use `sg-benchmarks/kubernetes-client-go` as the `repo` value in your answer — the oracle checks for this exact identifier.
4949
**Note**: Sourcegraph MCP tools return repo names with a `github.com/` prefix (e.g., `github.com/sg-benchmarks/kubernetes-client-go`). Strip this prefix in your answer — use `sg-benchmarks/kubernetes-client-go`, NOT `github.com/sg-benchmarks/kubernetes-client-go`.
5050

5151
Your answer is evaluated against a closed-world oracle — the exact repo, path, and symbol name matter.

benchmarks/ccb_mcp_crossrepo_tracing/ccx-dep-trace-001/instruction.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ that directly imports it (not just subpackages) will be affected by a breaking A
2020

2121
## Available Resources
2222

23-
The local `/workspace/` directory contains all repositories:
24-
- `kubernetes/kubernetes` at v1.32.0`/workspace/kubernetes`
25-
- `kubernetes/client-go` at v0.32.0`/workspace/client-go`
23+
Your ecosystem includes the following repositories:
24+
- `kubernetes/kubernetes` at v1.32.0
25+
- `kubernetes/client-go` at v0.32.0
2626

2727
## Output Format
2828

@@ -37,7 +37,7 @@ Create a file at `/workspace/answer.json` with your findings in the following st
3737
}
3838
```
3939

40-
**Important**: Use `"repo": "sg-benchmarks/kubernetes-client-go"` exactly — this is the canonical repo identifier used by the evaluation oracle. The local checkout at `/workspace/client-go` corresponds to this repo.
40+
**Important**: Use `"repo": "sg-benchmarks/kubernetes-client-go"` exactly — this is the canonical repo identifier used by the evaluation oracle. The `kubernetes/client-go` repository corresponds to `sg-benchmarks/kubernetes-client-go` in Sourcegraph.
4141
**Note**: Sourcegraph MCP tools return repo names with a `github.com/` prefix (e.g., `github.com/sg-benchmarks/kubernetes-client-go`). Strip this prefix in your answer — use `sg-benchmarks/kubernetes-client-go`, NOT `github.com/sg-benchmarks/kubernetes-client-go`.
4242

4343
Include only the `files` field. Your answer is evaluated against a closed-world oracle — completeness matters.

benchmarks/ccb_mcp_crossrepo_tracing/ccx-dep-trace-004/instruction.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ adding observability, or extending the query pipeline.
2222

2323
## Available Resources
2424

25-
The local `/workspace/` directory contains all repositories:
26-
- `grafana/grafana` at v11.4.0`/workspace/grafana`
27-
- `grafana/loki` at v3.3.4`/workspace/loki`
25+
Your ecosystem includes the following repositories:
26+
- `grafana/grafana` at v11.4.0
27+
- `grafana/loki` at v3.3.4
2828

2929
## Output Format
3030

@@ -44,7 +44,7 @@ Create a file at `/workspace/answer.json` with your findings in the following st
4444
- For Loki: `"repo": "sg-benchmarks/grafana-loki"`
4545
**Note**: Sourcegraph MCP tools return repo names with a `github.com/` prefix (e.g., `github.com/sg-benchmarks/kubernetes-client-go`). Strip this prefix in your answer — use `sg-benchmarks/kubernetes-client-go`, NOT `github.com/sg-benchmarks/kubernetes-client-go`.
4646

47-
The local checkout at `/workspace/loki` corresponds to `sg-benchmarks/grafana-loki`.
47+
The `grafana/loki` repository corresponds to `sg-benchmarks/grafana-loki` in Sourcegraph.
4848

4949
List the chain steps in order from Grafana (caller) to Loki (callee). Your answer is evaluated
5050
against a closed-world oracle — precision matters.

benchmarks/ccb_mcp_incident/ccx-incident-031/instruction.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Specifically, find:
2424

2525
## Important: Avoiding Decoys
2626

27-
The local `kubernetes/kubernetes` checkout at `/workspace/kubernetes` contains
27+
The `kubernetes/kubernetes` repository contains
2828
vendored copies of etcd code under `vendor/go.etcd.io/etcd/`. These vendored
2929
files look identical to the real source but are **not** the authoritative location.
3030

@@ -33,14 +33,14 @@ The Kubernetes apiserver also has its own error-mapping layer at
3333
the etcd error into Kubernetes error types — this is also **not** the authoritative
3434
source of the original error.
3535

36-
Your answer must cite the **upstream etcd repository** (accessible via Sourcegraph
37-
MCP tools), not the vendored copies or the Kubernetes error-mapping layer.
36+
Your answer must cite the **upstream etcd repository**, not the vendored copies
37+
or the Kubernetes error-mapping layer.
3838

3939
## Available Resources
4040

41-
The local `/workspace/` directory contains all repositories:
42-
- `kubernetes/kubernetes` at v1.32.0`/workspace/kubernetes`
43-
- `etcd-io/etcd` at v3.5.17 `/workspace/etcd` (this is where the error originates)
41+
Your ecosystem includes the following repositories:
42+
- `kubernetes/kubernetes` at v1.32.0
43+
- `etcd-io/etcd` at v3.5.17 (this is where the error originates)
4444

4545
## Output Format
4646

benchmarks/ccb_mcp_incident/ccx-incident-034/instruction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ definition files. Also include the HTTP fake/stub client used for testing.
2222

2323
## Available Resources
2424

25-
The local `/workspace/` directory contains all repositories:
26-
- `grafana/grafana` at v11.4.0`/workspace/grafana`
27-
- `grafana/loki` at v3.3.4`/workspace/loki`
25+
Your ecosystem includes the following repositories:
26+
- `grafana/grafana` at v11.4.0
27+
- `grafana/loki` at v3.3.4
2828

2929
## Output Format
3030

0 commit comments

Comments
 (0)