Skip to content

Commit 3fb43ea

Browse files
sjarmakclaude
andcommitted
fix: block local search tools in OH MCP mode + fix 3 missing instruction_mcp.md
- OpenHands MCP runs now create wrapper scripts in /opt/mcp_blockers/ that block grep/rg/ag/ack/find/fd/tree with a message directing the agent to use Sourcegraph MCP tools instead. PATH is prepended only for the agent process; verifiers run with original PATH and are unaffected. - Add SOURCEGRAPH_REPOS env var to 3 Dockerfiles missing it (ccx-crossorg-218, ccx-crossorg-219, ccx-vuln-remed-169) and generate their instruction_mcp.md. - Add comment in generate_sgonly_dockerfiles.py documenting why container-level blocking is unsafe (verifiers use bare grep/find). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent bbac266 commit 3fb43ea

File tree

11 files changed

+419
-12
lines changed

11 files changed

+419
-12
lines changed

.beads/backup/backup_state.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"last_dolt_commit": "va5omkih1c4t5cg9s4v3h2kpqigekalh",
2+
"last_dolt_commit": "dd0594s1h4afs6fr5t7ajnag90emfg0s",
33
"last_event_id": 0,
4-
"timestamp": "2026-03-11T18:24:16.353711768Z",
4+
"timestamp": "2026-03-11T20:01:24.898819257Z",
55
"counts": {
66
"issues": 27,
77
"events": 84,

agents/harnesses/openhands/agent.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,27 @@ def create_run_agent_commands(self, instruction: str):
152152
env=env,
153153
))
154154

155+
# Block local search tools in MCP mode so the agent is forced to use
156+
# Sourcegraph MCP tools (keyword_search, read_file, etc.) instead of
157+
# grep/find on truncated local files. Wrappers go in /opt/mcp_blockers/
158+
# and PATH is prepended in the agent's env. Verifiers run as separate
159+
# processes with the original PATH and are unaffected.
160+
if mcp_type in ("sourcegraph_full", "sourcegraph_base", "sourcegraph_isolated"):
161+
blocked_cmds = "grep rg ag ack find fd tree"
162+
blocker_script = (
163+
"mkdir -p /opt/mcp_blockers && "
164+
+ " && ".join(
165+
f"printf '#!/bin/sh\\necho \"ERROR: {cmd} is disabled in MCP mode."
166+
f" Use Sourcegraph MCP tools (keyword_search, read_file, list_files)"
167+
f" instead.\" >&2\\nexit 1\\n' > /opt/mcp_blockers/{cmd}"
168+
f" && chmod +x /opt/mcp_blockers/{cmd}"
169+
for cmd in blocked_cmds.split()
170+
)
171+
)
172+
exec_inputs.append(ExecInput(command=blocker_script, env=env))
173+
# Prepend blocker dir to PATH so agent hits wrappers first
174+
env["PATH"] = "/opt/mcp_blockers:" + env.get("PATH", "/usr/local/bin:/usr/bin:/bin")
175+
155176
# Build a Python launcher script that reads the task from file,
156177
# runs openhands.core.main, pipes output to a log file, and cleans up
157178
# orphan daemons. Everything stays in Python — no shell quoting at all.

benchmarks/csb_org_crossorg/ccx-crossorg-218/environment/Dockerfile.sg_only

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# ccx-crossorg-218 — sg_only_env variant
2-
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
1+
# ccx-crossorg-218 — sg_only_env variant (v2: clone-at-verify)
2+
# Empty workspace — agent uses Sourcegraph MCP for code access.
3+
# Verifier clones mirror(s) at verification time via clone manifest.
34

45
FROM ubuntu:22.04
56

67
ENV DEBIAN_FRONTEND=noninteractive
8+
ENV SOURCEGRAPH_REPOS="sg-evals/scikit-learn--cb7e82dd"
79

810
RUN apt-get update && apt-get install -y --no-install-recommends \
911
git \
@@ -21,7 +23,10 @@ RUN git init && \
2123

2224
RUN mkdir -p /logs/agent /logs/verifier
2325

24-
# Mark sg_only mode so verifiers can skip local-path checks
26+
# Clone manifest for verifier (clone-at-verify strategy)
27+
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/scikit-learn--cb7e82dd","target_dir":"scikit-learn--cb7e82dd"}]}' > /tmp/.sg_only_clone_manifest.json
28+
29+
# Mark sg_only mode
2530
RUN touch /tmp/.sg_only_mode
2631

2732
# Pre-create claude user and set ownership at build time.
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# IMPORTANT: Source Code Access
2+
3+
**Local source files are not present.** Your workspace does not contain source code. You **MUST** use Sourcegraph MCP tools to discover, read, and understand code before making any changes.
4+
5+
**Target Repositories (version-pinned mirrors):**
6+
7+
- `github.com/sg-evals/scikit-learn--cb7e82dd` — use `repo:^github.com/sg-evals/scikit-learn--cb7e82dd$` filter
8+
9+
Scope ALL keyword_search/nls_search queries to these repos.
10+
Use the repo name as the `repo` parameter for read_file/go_to_definition/find_references.
11+
12+
13+
## Required Workflow
14+
15+
1. **Search first** — Use MCP tools to find relevant files and understand existing patterns
16+
2. **Read remotely** — Use `sg_read_file` to read full file contents from Sourcegraph
17+
3. **Edit locally** — Use Edit, Write, and Bash to create or modify files in your working directory
18+
4. **Verify locally** — Run tests with Bash to check your changes
19+
20+
## Tool Selection
21+
22+
| Goal | Tool |
23+
|------|------|
24+
| Exact symbol/string | `sg_keyword_search` |
25+
| Concepts/semantic search | `sg_nls_search` |
26+
| Trace usage/callers | `sg_find_references` |
27+
| See implementation | `sg_go_to_definition` |
28+
| Read full file | `sg_read_file` |
29+
| Browse structure | `sg_list_files` |
30+
| Find repos | `sg_list_repos` |
31+
| Search commits | `sg_commit_search` |
32+
| Track changes | `sg_diff_search` |
33+
| Compare versions | `sg_compare_revisions` |
34+
35+
**Decision logic:**
36+
1. Know the exact symbol? → `sg_keyword_search`
37+
2. Know the concept, not the name? → `sg_nls_search`
38+
3. Need definition of a symbol? → `sg_go_to_definition`
39+
4. Need all callers/references? → `sg_find_references`
40+
5. Need full file content? → `sg_read_file`
41+
42+
## Scoping (Always Do This)
43+
44+
```
45+
repo:^github.com/ORG/REPO$ # Exact repo (preferred)
46+
repo:github.com/ORG/ # All repos in org
47+
file:.*\.ts$ # TypeScript only
48+
file:src/api/ # Specific directory
49+
```
50+
51+
Start narrow. Expand only if results are empty.
52+
53+
## Efficiency Rules
54+
55+
- Chain searches logically: search → read → references → definition
56+
- Don't re-search for the same pattern; use results from prior calls
57+
- Prefer `sg_keyword_search` over `sg_nls_search` when you have exact terms
58+
- Read 2-3 related files before synthesising, rather than one at a time
59+
- Don't read 20+ remote files without writing code — once you understand the pattern, start implementing
60+
61+
## If Stuck
62+
63+
If MCP search returns no results:
64+
1. Broaden the search query (synonyms, partial identifiers)
65+
2. Try `sg_nls_search` for semantic matching
66+
3. Use `sg_list_files` to browse the directory structure
67+
4. Use `sg_list_repos` to verify the repository name
68+
69+
---
70+
71+
**Sourcegraph Repositories:** `github.com/sg-evals/scikit-learn--cb7e82dd`
72+
73+
# Structured Logging Pattern in Python ML Repos
74+
75+
## Your Task
76+
77+
Find Python source files in scikit-learn/scikit-learn that implement structured logging and warning mechanisms: the ConvergenceWarning usage, the sklearn check_is_fitted warning, and the verbose parameter logging across estimators.
78+
79+
## Context
80+
81+
You are working on a codebase task involving repos from the crossorg domain.
82+
83+
## Available Resources
84+
85+
No local repositories are pre-checked out.
86+
87+
88+
## Output Format
89+
90+
Use the published task contract:
91+
92+
- `TASK_WORKDIR=/workspace`
93+
- `TASK_REPO_ROOT=/workspace`
94+
- `TASK_OUTPUT=/workspace/answer.json`
95+
96+
Create a file at `TASK_OUTPUT` (`/workspace/answer.json`) with your findings in the following structure:
97+
98+
```json
99+
{
100+
"files": [
101+
{"repo": "org/repo-name", "path": "relative/path/to/file.go"}
102+
],
103+
"symbols": [
104+
{"repo": "org/repo-name", "path": "relative/path/to/file.go", "symbol": "SymbolName"}
105+
],
106+
"chain": [
107+
{"repo": "org/repo-name", "path": "relative/path/to/file.go", "symbol": "FunctionName"}
108+
],
109+
"text": "Narrative explanation of your findings, citing repos and file paths."
110+
}
111+
```
112+
113+
Include only the fields relevant to this task. Your answer is evaluated against a closed-world oracle — completeness matters.
114+
115+
## Evaluation
116+
117+
Your answer will be scored on:
118+
- **File recall and precision**: Did you find all relevant files?

benchmarks/csb_org_crossorg/ccx-crossorg-219/environment/Dockerfile.sg_only

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# ccx-crossorg-219 — sg_only_env variant
2-
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
1+
# ccx-crossorg-219 — sg_only_env variant (v2: clone-at-verify)
2+
# Empty workspace — agent uses Sourcegraph MCP for code access.
3+
# Verifier clones mirror(s) at verification time via clone manifest.
34

45
FROM ubuntu:22.04
56

67
ENV DEBIAN_FRONTEND=noninteractive
8+
ENV SOURCEGRAPH_REPOS="sg-evals/prometheus--ba14bc4,sg-evals/grafana--26d36ec"
79

810
RUN apt-get update && apt-get install -y --no-install-recommends \
911
git \
@@ -21,7 +23,10 @@ RUN git init && \
2123

2224
RUN mkdir -p /logs/agent /logs/verifier
2325

24-
# Mark sg_only mode so verifiers can skip local-path checks
26+
# Clone manifest for verifier (clone-at-verify strategy)
27+
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/prometheus--ba14bc4","target_dir":"prometheus--ba14bc4"},{"mirror":"sg-evals/grafana--26d36ec","target_dir":"grafana--26d36ec"}]}' > /tmp/.sg_only_clone_manifest.json
28+
29+
# Mark sg_only mode
2530
RUN touch /tmp/.sg_only_mode
2631

2732
# Pre-create claude user and set ownership at build time.
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# IMPORTANT: Source Code Access
2+
3+
**Local source files are not present.** Your workspace does not contain source code. You **MUST** use Sourcegraph MCP tools to discover, read, and understand code before making any changes.
4+
5+
**Target Repositories (version-pinned mirrors):**
6+
7+
- `github.com/sg-evals/prometheus--ba14bc4` — use `repo:^github.com/sg-evals/prometheus--ba14bc4$` filter
8+
- `github.com/sg-evals/grafana--26d36ec` — use `repo:^github.com/sg-evals/grafana--26d36ec$` filter
9+
10+
Scope ALL keyword_search/nls_search queries to these repos.
11+
Use the repo name as the `repo` parameter for read_file/go_to_definition/find_references.
12+
13+
14+
## Required Workflow
15+
16+
1. **Search first** — Use MCP tools to find relevant files and understand existing patterns
17+
2. **Read remotely** — Use `sg_read_file` to read full file contents from Sourcegraph
18+
3. **Edit locally** — Use Edit, Write, and Bash to create or modify files in your working directory
19+
4. **Verify locally** — Run tests with Bash to check your changes
20+
21+
## Tool Selection
22+
23+
| Goal | Tool |
24+
|------|------|
25+
| Exact symbol/string | `sg_keyword_search` |
26+
| Concepts/semantic search | `sg_nls_search` |
27+
| Trace usage/callers | `sg_find_references` |
28+
| See implementation | `sg_go_to_definition` |
29+
| Read full file | `sg_read_file` |
30+
| Browse structure | `sg_list_files` |
31+
| Find repos | `sg_list_repos` |
32+
| Search commits | `sg_commit_search` |
33+
| Track changes | `sg_diff_search` |
34+
| Compare versions | `sg_compare_revisions` |
35+
36+
**Decision logic:**
37+
1. Know the exact symbol? → `sg_keyword_search`
38+
2. Know the concept, not the name? → `sg_nls_search`
39+
3. Need definition of a symbol? → `sg_go_to_definition`
40+
4. Need all callers/references? → `sg_find_references`
41+
5. Need full file content? → `sg_read_file`
42+
43+
## Scoping (Always Do This)
44+
45+
```
46+
repo:^github.com/ORG/REPO$ # Exact repo (preferred)
47+
repo:github.com/ORG/ # All repos in org
48+
file:.*\.ts$ # TypeScript only
49+
file:src/api/ # Specific directory
50+
```
51+
52+
Start narrow. Expand only if results are empty.
53+
54+
## Efficiency Rules
55+
56+
- Chain searches logically: search → read → references → definition
57+
- Don't re-search for the same pattern; use results from prior calls
58+
- Prefer `sg_keyword_search` over `sg_nls_search` when you have exact terms
59+
- Read 2-3 related files before synthesising, rather than one at a time
60+
- Don't read 20+ remote files without writing code — once you understand the pattern, start implementing
61+
62+
## If Stuck
63+
64+
If MCP search returns no results:
65+
1. Broaden the search query (synonyms, partial identifiers)
66+
2. Try `sg_nls_search` for semantic matching
67+
3. Use `sg_list_files` to browse the directory structure
68+
4. Use `sg_list_repos` to verify the repository name
69+
70+
---
71+
72+
**Sourcegraph Repositories:** `github.com/sg-evals/prometheus--ba14bc4`, `github.com/sg-evals/grafana--26d36ec`
73+
74+
# OpenTelemetry Span Creation Across Monitoring Stack
75+
76+
## Your Task
77+
78+
Find Go source files in prometheus/prometheus and grafana/grafana that create and annotate OpenTelemetry spans: the tracer initialization, the span start/end calls, the span attribute setting, and the span error recording patterns used in both projects.
79+
80+
## Context
81+
82+
You are working on a codebase task involving repos from the crossorg domain.
83+
84+
## Available Resources
85+
86+
No local repositories are pre-checked out.
87+
88+
89+
## Output Format
90+
91+
Use the published task contract:
92+
93+
- `TASK_WORKDIR=/workspace`
94+
- `TASK_REPO_ROOT=/workspace`
95+
- `TASK_OUTPUT=/workspace/answer.json`
96+
97+
Create a file at `TASK_OUTPUT` (`/workspace/answer.json`) with your findings in the following structure:
98+
99+
```json
100+
{
101+
"files": [
102+
{"repo": "org/repo-name", "path": "relative/path/to/file.go"}
103+
],
104+
"symbols": [
105+
{"repo": "org/repo-name", "path": "relative/path/to/file.go", "symbol": "SymbolName"}
106+
],
107+
"chain": [
108+
{"repo": "org/repo-name", "path": "relative/path/to/file.go", "symbol": "FunctionName"}
109+
],
110+
"text": "Narrative explanation of your findings, citing repos and file paths."
111+
}
112+
```
113+
114+
Include only the fields relevant to this task. Your answer is evaluated against a closed-world oracle — completeness matters.
115+
116+
## Evaluation
117+
118+
Your answer will be scored on:
119+
- **File recall and precision**: Did you find all relevant files?

benchmarks/csb_org_security/ccx-vuln-remed-169/environment/Dockerfile.sg_only

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# ccx-vuln-remed-169 — sg_only_env variant
2-
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
1+
# ccx-vuln-remed-169 — sg_only_env variant (v2: clone-at-verify)
2+
# Empty workspace — agent uses Sourcegraph MCP for code access.
3+
# Verifier clones mirror(s) at verification time via clone manifest.
34

45
FROM ubuntu:22.04
56

67
ENV DEBIAN_FRONTEND=noninteractive
8+
ENV SOURCEGRAPH_REPOS="sg-evals/node--v22.13.0"
79

810
RUN apt-get update && apt-get install -y --no-install-recommends \
911
git \
@@ -21,7 +23,10 @@ RUN git init && \
2123

2224
RUN mkdir -p /logs/agent /logs/verifier
2325

24-
# Mark sg_only mode so verifiers can skip local-path checks
26+
# Clone manifest for verifier (clone-at-verify strategy)
27+
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/node--v22.13.0","target_dir":"node--v22.13.0"}]}' > /tmp/.sg_only_clone_manifest.json
28+
29+
# Mark sg_only mode
2530
RUN touch /tmp/.sg_only_mode
2631

2732
# Pre-create claude user and set ownership at build time.

0 commit comments

Comments
 (0)