sourcegraph
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/environment/Dockerfile‎
Lines changed: 27 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/environment/Dockerfile‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/environment/Dockerfile.sg_only‎
Lines changed: 28 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/environment/Dockerfile.sg_only‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/instruction.md‎
Lines changed: 62 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/instruction.md‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/task.toml‎
Lines changed: 29 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/task.toml‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/criteria.json‎
Lines changed: 22 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/criteria.json‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/eval.sh‎
Lines changed: 69 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/eval.sh‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/oracle_answer.json‎
Lines changed: 32 additions & 0 deletions b/‎benchmarks/ccb_mcp_onboarding/ccx-explore-042-ds/tests/oracle_answer.json‎
Lines changed: 32 additions & 0 deletions
@@ -0,0 +1,27 @@
+FROM ubuntu:22.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Base tools
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    ca-certificates \
+    curl \
+    python3 \
+    python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /workspace
+
+# Clone local checkout repos (baseline config: agent has local access to scikit-learn/scikit-learn)
+RUN git clone --depth 1 --branch 1.6.1 https://github.com/scikit-learn/scikit-learn /workspace/scikit-learn
+
+# Initialize git identity for agent commits
+RUN git config --global user.email "agent@example.com" && \
+    git config --global user.name "Agent" && \
+    git config --global safe.directory '*'
+
+# Create log directories
+RUN mkdir -p /logs/agent /logs/verifier
+
+ENTRYPOINT []
@@ -0,0 +1,28 @@
+FROM ubuntu:22.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Base tools
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    ca-certificates \
+    curl \
+    python3 \
+    python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /workspace
+
+# sg_only mode: no repo clones — agent must use Sourcegraph MCP to access all repos
+# Mark sg_only mode so eval.sh can detect it
+RUN touch /tmp/.sg_only_mode
+
+# Initialize git identity for agent commits
+RUN git config --global user.email "agent@example.com" && \
+    git config --global user.name "Agent" && \
+    git config --global safe.directory '*'
+
+# Create log directories
+RUN mkdir -p /logs/agent /logs/verifier
+
+ENTRYPOINT []
@@ -0,0 +1,62 @@
+# Architecture Map: Scientific Computing Data Flow
+
+## Your Task
+
+You are onboarding to a scientific computing team that uses the Python ML stack.
+A senior engineer has asked you to produce a technical map of how data flows from
+**raw array creation through scientific computation** across the three core libraries:
+numpy, pandas, and scipy.
+
+**Your question**: Map the data flow from raw array creation through scientific
+computation across these repos. Your explanation must trace through all three layers:
+
+1. **Array computation layer** — What function in `numpy/numpy` is the canonical
+   entry point for array-level aggregation on raw ndarray objects?
+2. **Data structure layer** — What class in `pandas-dev/pandas` wraps a NumPy ndarray
+   as a pandas extension array, enabling interoperability between the two libraries?
+3. **Scientific computation layer** — What function in `scipy/scipy` accepts numpy
+   arrays (or pandas Series) as inputs for statistical analysis?
+
+For each step, cite the specific repository, file path, and function/class name.
+
+## Context
+
+You are working with the Python ML stack in a cross-org environment:
+
+- Local `/workspace/scikit-learn/` — `scikit-learn/scikit-learn` (ML algorithms)
+- Accessible via Sourcegraph MCP:
+  - `numpy/numpy` (array-computing)
+  - `pandas-dev/pandas` (dataframe-library)
+  - `scipy/scipy` (scientific-computing)
+
+This question is specifically designed to benefit from cross-repo synthesis. The
+data flow spans multiple organizations and can only be fully understood by examining
+all three repos together.
+
+## Output Format
+
+Create a file at `/workspace/answer.json` with your findings:
+
+```json
+{
+  "chain": [
+    {
+      "repo": "org/repo-name",
+      "path": "relative/path/to/file.py",
+      "symbol": "FunctionOrClassName",
+      "description": "What role this plays in the data flow"
+    }
+  ],
+  "text": "Comprehensive narrative explaining how data flows from raw array creation through scientific computation, citing specific files and functions from each repo."
+}
+```
+
+The `chain` should contain at least 3 steps representing the 3 layers described above.
+
+## Evaluation
+
+Your answer will be scored on:
+- **Flow coverage**: Does the chain include key steps from all 3 layers (array creation → pandas integration → scipy computation)?
+- **Technical accuracy**: Are the cited file paths and function/class names correct?
+- **Provenance**: Does your narrative reference all three repositories with specific file paths?
+- **Synthesis quality** (supplementary): Does the explanation connect the layers in a way that reveals the ecosystem architecture?
@@ -0,0 +1,29 @@
+version = "1.0"
+
+[metadata]
+name = "CCX-explore-042-ds"
+description = "Architecture map: trace the data flow from raw array creation through scientific computation across numpy, pandas, and scipy (Deep Search variant)"
+license = "BSD-3-Clause"
+
+[task]
+id = "CCX-explore-042-ds"
+repo = "scikit-learn/scikit-learn"
+category = "onboarding-comprehension"
+language = "python"
+difficulty = "hard"
+time_limit_sec = 1200
+mcp_suite = "ccb_mcp_onboarding"
+use_case_id = 42
+repo_set_id = "python-ml-stack"
+mcp_unique = true
+deepsearch_relevant = true
+
+[verification]
+type = "eval"
+command = "bash /tests/eval.sh"
+
+reward_type = "score"
+description = "Architecture map: data flow from raw array creation through scientific computation across numpy, pandas, scipy (Deep Search variant)"
+
+[environment]
+build_timeout_sec = 600.0
@@ -0,0 +1,22 @@
+[
+  {
+    "metric": "data_flow_tracing",
+    "description": "Accurate: Does the answer correctly trace the data flow from raw array creation (numpy) through pandas integration to scipy computation? Attributed: Does it cite specific files and symbols from each of the 3 repos? Actionable: Would a scientist understand which module to study for each step in the pipeline?",
+    "max_score": 3
+  },
+  {
+    "metric": "cross_repo_synthesis",
+    "description": "Accurate: Does the explanation describe how data moves between the three scientific Python libraries (numpy → pandas → scipy)? Attributed: Does the answer reference at least numpy, pandas, and scipy with specific file paths? Actionable: Could a developer follow the data flow to debug a scientific computation pipeline?",
+    "max_score": 3
+  },
+  {
+    "metric": "technical_accuracy",
+    "description": "Accurate: Are the cited file paths, function names, and class names correct (e.g., 'mean' in fromnumeric.py, 'NumpyExtensionArray', 'pearsonr')? Attributed: Do the cited file paths correspond to actual files in the repos? Actionable: Could a developer navigate to the code at the cited locations?",
+    "max_score": 2
+  },
+  {
+    "metric": "synthesis_quality",
+    "description": "Accurate: Does the explanation go beyond listing facts to explain WHY data flows this way in the scientific Python ecosystem? Attributed: Does it explain the design choices (e.g., why pandas uses NumpyExtensionArray as an integration layer)? Actionable: Would a new data scientist understand the ecosystem architecture and where to look for interoperability issues?",
+    "max_score": 2
+  }
+]
@@ -0,0 +1,69 @@
+#!/bin/bash
+# eval.sh — MCP-unique benchmark evaluator for CCX-explore-042-ds
+# Exit-code-first (SWE-Factory pattern):
+#   exit 0 — agent produced useful output (composite score > 0)
+#   exit 1 — total failure (composite score == 0 or missing answer)
+#
+# Writes /logs/verifier/reward.txt with the composite score [0.0, 1.0]
+# Rubric judge (criteria.json) is supplementary — not used in this script.
+
+set -euo pipefail
+
+TASK_ID="CCX-explore-042-ds"
+ANSWER_PATH="/workspace/answer.json"
+TASK_SPEC_PATH="/tests/task_spec.json"
+ORACLE_CHECKS="/tests/oracle_checks.py"
+REWARD_PATH="/logs/verifier/reward.txt"
+
+mkdir -p /logs/verifier
+
+echo "=== CCX-explore-042-ds evaluator ==="
+echo "Task spec: $TASK_SPEC_PATH"
+echo "Answer:    $ANSWER_PATH"
+echo ""
+
+# sg_only mode guard: restore full repo if verifier wrapper exists
+if [ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ]; then
+    echo "sg_only mode: sourcing verifier wrapper..."
+    source /tests/sgonly_verifier_wrapper.sh
+fi
+
+# Verify answer file exists
+if [ ! -f "$ANSWER_PATH" ]; then
+    echo "ERROR: answer.json not found at $ANSWER_PATH"
+    echo "0.0" > "$REWARD_PATH"
+    exit 1
+fi
+
+# Validate answer is valid JSON
+if ! python3 -c "import json; json.load(open('$ANSWER_PATH'))" 2>/dev/null; then
+    echo "ERROR: answer.json is not valid JSON"
+    echo "0.0" > "$REWARD_PATH"
+    exit 1
+fi
+
+echo "answer.json found and valid JSON"
+
+# Run oracle checks
+if [ ! -f "$ORACLE_CHECKS" ]; then
+    echo "ERROR: oracle_checks.py not found at $ORACLE_CHECKS"
+    echo "0.0" > "$REWARD_PATH"
+    exit 1
+fi
+
+echo "Running oracle checks..."
+SCORE=$(python3 "$ORACLE_CHECKS" --answer "$ANSWER_PATH" --spec "$TASK_SPEC_PATH" --verbose 2>&1 | tee /dev/stderr | tail -1)
+
+# Validate score is a number
+if ! echo "$SCORE" | python3 -c "import sys; float(sys.stdin.read().strip())" 2>/dev/null; then
+    echo "ERROR: oracle_checks.py did not return a valid score: $SCORE"
+    echo "0.0" > "$REWARD_PATH"
+    exit 1
+fi
+
+echo ""
+echo "Composite score: $SCORE"
+echo "$SCORE" > "$REWARD_PATH"
+
+# Exit based on score (SWE-Factory exit-code-first pattern)
+python3 -c "import sys; sys.exit(0 if float('$SCORE') > 0 else 1)"
@@ -0,0 +1,32 @@
+{
+  "chain": [
+    {
+      "repo": "numpy/numpy",
+      "path": "numpy/_core/fromnumeric.py",
+      "symbol": "mean",
+      "description": "NumPy's canonical array aggregation function. Accepts any array-like input (ndarray, lists, scalars) and computes the arithmetic mean. This is the primary entry point for reducing raw numpy arrays to scalar or lower-dimensional values."
+    },
+    {
+      "repo": "pandas-dev/pandas",
+      "path": "pandas/core/arrays/numpy_.py",
+      "symbol": "NumpyExtensionArray",
+      "description": "Pandas extension array class that wraps a NumPy ndarray as a pandas-compatible backing store. This is the integration layer between numpy arrays and pandas Series/DataFrame — when you pass a numpy array to pandas, NumpyExtensionArray bridges the two libraries."
+    },
+    {
+      "repo": "scipy/scipy",
+      "path": "scipy/stats/_stats_py.py",
+      "symbol": "pearsonr",
+      "description": "SciPy's Pearson correlation coefficient function. Accepts numpy arrays or pandas Series as x and y inputs, performing statistical analysis on array-like data. This is the scientific computation endpoint in the data flow chain."
+    }
+  ],
+  "text": "Scientific computing data flow across the Python ML stack: Step 1 — Array computation layer (numpy/numpy, numpy/_core/fromnumeric.py): The `mean()` function is the canonical entry point for array-level aggregation. It accepts array-like inputs and is dispatched via numpy's __array_function__ protocol to operate on ndarray objects. Step 2 — Data structure layer (pandas-dev/pandas, pandas/core/arrays/numpy_.py): NumpyExtensionArray is the bridge class that wraps a NumPy ndarray as a pandas extension array. When pandas Series or DataFrame receives a numpy array, NumpyExtensionArray._from_sequence() creates the backing store, enabling interoperability between numpy and pandas. Step 3 — Scientific computation layer (scipy/scipy, scipy/stats/_stats_py.py): scipy.stats.pearsonr() accepts numpy arrays or pandas Series as inputs and computes the Pearson correlation coefficient. This is the endpoint where raw array data is transformed into a statistical result. Together, these three components represent the canonical scientific Python data flow: numpy creates and aggregates arrays, pandas wraps them in labeled structures, and scipy performs scientific analysis on the data.",
+  "_metadata": {
+    "oracle_type": "dependency_chain",
+    "discovery_method": "sourcegraph_keyword_search",
+    "repos_searched": [
+      "github.com/numpy/numpy",
+      "github.com/pandas-dev/pandas",
+      "github.com/scipy/scipy"
+    ]
+  }
+}