Skip to content

Commit 5969025

Browse files
sjarmakclaude
andcommitted
feat: US-017 - Deep Search exploration tasks (2 additional tasks)
Add 2 Deep Search-specific exploration tasks designed for cross-repo synthesis and open-ended architectural discovery: - CCX-explore-042-ds (E42, ccb_mcp_onboarding): Architecture map of scientific computing data flow across numpy/pandas/scipy. Oracle chain: numpy/_core/fromnumeric.py (mean) → pandas/core/arrays/numpy_.py (NumpyExtensionArray) → scipy/stats/_stats_py.py (pearsonr). eval.sh: dependency_chain + provenance. - CCX-explore-091-ds (J91, ccb_mcp_platform): Service deployment pattern discovery across Kubernetes ecosystem. Oracle files: kubernetes-api apps/v1/types.go + kubernetes-client-go deployment examples + README. eval.sh: file_set_match + keyword_presence. Both tasks: - Have tests/criteria.json with 4 AAA rubric criteria (supplementary) - Are marked deepsearch_relevant=true in selection file - Pass validity gate (gold=1.0, empty=0.0) - Use open-ended exploratory framing designed for DS synthesis Adds ccb_mcp_platform suite directory (new). Selection file now has 12 tasks total with 3 deepsearch_relevant=true tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 5aefbfd commit 5969025

File tree

21 files changed

+1766
-1
lines changed

21 files changed

+1766
-1
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
python3-pip \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
# Clone local checkout repos (baseline config: agent has local access to scikit-learn/scikit-learn)
17+
RUN git clone --depth 1 --branch 1.6.1 https://github.com/scikit-learn/scikit-learn /workspace/scikit-learn
18+
19+
# Initialize git identity for agent commits
20+
RUN git config --global user.email "agent@example.com" && \
21+
git config --global user.name "Agent" && \
22+
git config --global safe.directory '*'
23+
24+
# Create log directories
25+
RUN mkdir -p /logs/agent /logs/verifier
26+
27+
ENTRYPOINT []
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
python3-pip \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
# sg_only mode: no repo clones — agent must use Sourcegraph MCP to access all repos
17+
# Mark sg_only mode so eval.sh can detect it
18+
RUN touch /tmp/.sg_only_mode
19+
20+
# Initialize git identity for agent commits
21+
RUN git config --global user.email "agent@example.com" && \
22+
git config --global user.name "Agent" && \
23+
git config --global safe.directory '*'
24+
25+
# Create log directories
26+
RUN mkdir -p /logs/agent /logs/verifier
27+
28+
ENTRYPOINT []
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Architecture Map: Scientific Computing Data Flow
2+
3+
## Your Task
4+
5+
You are onboarding to a scientific computing team that uses the Python ML stack.
6+
A senior engineer has asked you to produce a technical map of how data flows from
7+
**raw array creation through scientific computation** across the three core libraries:
8+
numpy, pandas, and scipy.
9+
10+
**Your question**: Map the data flow from raw array creation through scientific
11+
computation across these repos. Your explanation must trace through all three layers:
12+
13+
1. **Array computation layer** — What function in `numpy/numpy` is the canonical
14+
entry point for array-level aggregation on raw ndarray objects?
15+
2. **Data structure layer** — What class in `pandas-dev/pandas` wraps a NumPy ndarray
16+
as a pandas extension array, enabling interoperability between the two libraries?
17+
3. **Scientific computation layer** — What function in `scipy/scipy` accepts numpy
18+
arrays (or pandas Series) as inputs for statistical analysis?
19+
20+
For each step, cite the specific repository, file path, and function/class name.
21+
22+
## Context
23+
24+
You are working with the Python ML stack in a cross-org environment:
25+
26+
- Local `/workspace/scikit-learn/``scikit-learn/scikit-learn` (ML algorithms)
27+
- Accessible via Sourcegraph MCP:
28+
- `numpy/numpy` (array-computing)
29+
- `pandas-dev/pandas` (dataframe-library)
30+
- `scipy/scipy` (scientific-computing)
31+
32+
This question is specifically designed to benefit from cross-repo synthesis. The
33+
data flow spans multiple organizations and can only be fully understood by examining
34+
all three repos together.
35+
36+
## Output Format
37+
38+
Create a file at `/workspace/answer.json` with your findings:
39+
40+
```json
41+
{
42+
"chain": [
43+
{
44+
"repo": "org/repo-name",
45+
"path": "relative/path/to/file.py",
46+
"symbol": "FunctionOrClassName",
47+
"description": "What role this plays in the data flow"
48+
}
49+
],
50+
"text": "Comprehensive narrative explaining how data flows from raw array creation through scientific computation, citing specific files and functions from each repo."
51+
}
52+
```
53+
54+
The `chain` should contain at least 3 steps representing the 3 layers described above.
55+
56+
## Evaluation
57+
58+
Your answer will be scored on:
59+
- **Flow coverage**: Does the chain include key steps from all 3 layers (array creation → pandas integration → scipy computation)?
60+
- **Technical accuracy**: Are the cited file paths and function/class names correct?
61+
- **Provenance**: Does your narrative reference all three repositories with specific file paths?
62+
- **Synthesis quality** (supplementary): Does the explanation connect the layers in a way that reveals the ecosystem architecture?
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "CCX-explore-042-ds"
5+
description = "Architecture map: trace the data flow from raw array creation through scientific computation across numpy, pandas, and scipy (Deep Search variant)"
6+
license = "BSD-3-Clause"
7+
8+
[task]
9+
id = "CCX-explore-042-ds"
10+
repo = "scikit-learn/scikit-learn"
11+
category = "onboarding-comprehension"
12+
language = "python"
13+
difficulty = "hard"
14+
time_limit_sec = 1200
15+
mcp_suite = "ccb_mcp_onboarding"
16+
use_case_id = 42
17+
repo_set_id = "python-ml-stack"
18+
mcp_unique = true
19+
deepsearch_relevant = true
20+
21+
[verification]
22+
type = "eval"
23+
command = "bash /tests/eval.sh"
24+
25+
reward_type = "score"
26+
description = "Architecture map: data flow from raw array creation through scientific computation across numpy, pandas, scipy (Deep Search variant)"
27+
28+
[environment]
29+
build_timeout_sec = 600.0
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
[
2+
{
3+
"metric": "data_flow_tracing",
4+
"description": "Accurate: Does the answer correctly trace the data flow from raw array creation (numpy) through pandas integration to scipy computation? Attributed: Does it cite specific files and symbols from each of the 3 repos? Actionable: Would a scientist understand which module to study for each step in the pipeline?",
5+
"max_score": 3
6+
},
7+
{
8+
"metric": "cross_repo_synthesis",
9+
"description": "Accurate: Does the explanation describe how data moves between the three scientific Python libraries (numpy → pandas → scipy)? Attributed: Does the answer reference at least numpy, pandas, and scipy with specific file paths? Actionable: Could a developer follow the data flow to debug a scientific computation pipeline?",
10+
"max_score": 3
11+
},
12+
{
13+
"metric": "technical_accuracy",
14+
"description": "Accurate: Are the cited file paths, function names, and class names correct (e.g., 'mean' in fromnumeric.py, 'NumpyExtensionArray', 'pearsonr')? Attributed: Do the cited file paths correspond to actual files in the repos? Actionable: Could a developer navigate to the code at the cited locations?",
15+
"max_score": 2
16+
},
17+
{
18+
"metric": "synthesis_quality",
19+
"description": "Accurate: Does the explanation go beyond listing facts to explain WHY data flows this way in the scientific Python ecosystem? Attributed: Does it explain the design choices (e.g., why pandas uses NumpyExtensionArray as an integration layer)? Actionable: Would a new data scientist understand the ecosystem architecture and where to look for interoperability issues?",
20+
"max_score": 2
21+
}
22+
]
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/bin/bash
2+
# eval.sh — MCP-unique benchmark evaluator for CCX-explore-042-ds
3+
# Exit-code-first (SWE-Factory pattern):
4+
# exit 0 — agent produced useful output (composite score > 0)
5+
# exit 1 — total failure (composite score == 0 or missing answer)
6+
#
7+
# Writes /logs/verifier/reward.txt with the composite score [0.0, 1.0]
8+
# Rubric judge (criteria.json) is supplementary — not used in this script.
9+
10+
set -euo pipefail
11+
12+
TASK_ID="CCX-explore-042-ds"
13+
ANSWER_PATH="/workspace/answer.json"
14+
TASK_SPEC_PATH="/tests/task_spec.json"
15+
ORACLE_CHECKS="/tests/oracle_checks.py"
16+
REWARD_PATH="/logs/verifier/reward.txt"
17+
18+
mkdir -p /logs/verifier
19+
20+
echo "=== CCX-explore-042-ds evaluator ==="
21+
echo "Task spec: $TASK_SPEC_PATH"
22+
echo "Answer: $ANSWER_PATH"
23+
echo ""
24+
25+
# sg_only mode guard: restore full repo if verifier wrapper exists
26+
if [ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ]; then
27+
echo "sg_only mode: sourcing verifier wrapper..."
28+
source /tests/sgonly_verifier_wrapper.sh
29+
fi
30+
31+
# Verify answer file exists
32+
if [ ! -f "$ANSWER_PATH" ]; then
33+
echo "ERROR: answer.json not found at $ANSWER_PATH"
34+
echo "0.0" > "$REWARD_PATH"
35+
exit 1
36+
fi
37+
38+
# Validate answer is valid JSON
39+
if ! python3 -c "import json; json.load(open('$ANSWER_PATH'))" 2>/dev/null; then
40+
echo "ERROR: answer.json is not valid JSON"
41+
echo "0.0" > "$REWARD_PATH"
42+
exit 1
43+
fi
44+
45+
echo "answer.json found and valid JSON"
46+
47+
# Run oracle checks
48+
if [ ! -f "$ORACLE_CHECKS" ]; then
49+
echo "ERROR: oracle_checks.py not found at $ORACLE_CHECKS"
50+
echo "0.0" > "$REWARD_PATH"
51+
exit 1
52+
fi
53+
54+
echo "Running oracle checks..."
55+
SCORE=$(python3 "$ORACLE_CHECKS" --answer "$ANSWER_PATH" --spec "$TASK_SPEC_PATH" --verbose 2>&1 | tee /dev/stderr | tail -1)
56+
57+
# Validate score is a number
58+
if ! echo "$SCORE" | python3 -c "import sys; float(sys.stdin.read().strip())" 2>/dev/null; then
59+
echo "ERROR: oracle_checks.py did not return a valid score: $SCORE"
60+
echo "0.0" > "$REWARD_PATH"
61+
exit 1
62+
fi
63+
64+
echo ""
65+
echo "Composite score: $SCORE"
66+
echo "$SCORE" > "$REWARD_PATH"
67+
68+
# Exit based on score (SWE-Factory exit-code-first pattern)
69+
python3 -c "import sys; sys.exit(0 if float('$SCORE') > 0 else 1)"
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"chain": [
3+
{
4+
"repo": "numpy/numpy",
5+
"path": "numpy/_core/fromnumeric.py",
6+
"symbol": "mean",
7+
"description": "NumPy's canonical array aggregation function. Accepts any array-like input (ndarray, lists, scalars) and computes the arithmetic mean. This is the primary entry point for reducing raw numpy arrays to scalar or lower-dimensional values."
8+
},
9+
{
10+
"repo": "pandas-dev/pandas",
11+
"path": "pandas/core/arrays/numpy_.py",
12+
"symbol": "NumpyExtensionArray",
13+
"description": "Pandas extension array class that wraps a NumPy ndarray as a pandas-compatible backing store. This is the integration layer between numpy arrays and pandas Series/DataFrame — when you pass a numpy array to pandas, NumpyExtensionArray bridges the two libraries."
14+
},
15+
{
16+
"repo": "scipy/scipy",
17+
"path": "scipy/stats/_stats_py.py",
18+
"symbol": "pearsonr",
19+
"description": "SciPy's Pearson correlation coefficient function. Accepts numpy arrays or pandas Series as x and y inputs, performing statistical analysis on array-like data. This is the scientific computation endpoint in the data flow chain."
20+
}
21+
],
22+
"text": "Scientific computing data flow across the Python ML stack: Step 1 — Array computation layer (numpy/numpy, numpy/_core/fromnumeric.py): The `mean()` function is the canonical entry point for array-level aggregation. It accepts array-like inputs and is dispatched via numpy's __array_function__ protocol to operate on ndarray objects. Step 2 — Data structure layer (pandas-dev/pandas, pandas/core/arrays/numpy_.py): NumpyExtensionArray is the bridge class that wraps a NumPy ndarray as a pandas extension array. When pandas Series or DataFrame receives a numpy array, NumpyExtensionArray._from_sequence() creates the backing store, enabling interoperability between numpy and pandas. Step 3 — Scientific computation layer (scipy/scipy, scipy/stats/_stats_py.py): scipy.stats.pearsonr() accepts numpy arrays or pandas Series as inputs and computes the Pearson correlation coefficient. This is the endpoint where raw array data is transformed into a statistical result. Together, these three components represent the canonical scientific Python data flow: numpy creates and aggregates arrays, pandas wraps them in labeled structures, and scipy performs scientific analysis on the data.",
23+
"_metadata": {
24+
"oracle_type": "dependency_chain",
25+
"discovery_method": "sourcegraph_keyword_search",
26+
"repos_searched": [
27+
"github.com/numpy/numpy",
28+
"github.com/pandas-dev/pandas",
29+
"github.com/scipy/scipy"
30+
]
31+
}
32+
}

0 commit comments

Comments
 (0)