Adding SWE-Smith support #122

MarcCote · 2025-05-22T20:09:31Z

This pull request introduces several enhancements and bug fixes to the debug_gym repository, including the addition of a new agent class, support for a new environment, and various utility improvements. Key changes include the implementation of the AgentSolution class, the addition of the SWESmithEnv environment, and updates to configuration loading and container setup logic.

New Features

New Agent Class: Introduced AgentSolution, a new agent class in debug_gym/agents/solution_agent.py, which includes methods for debugging tasks, applying patches, and evaluating results.
New Environment: Added SWESmithEnv in debug_gym/gym/envs/swe_smith.py, which supports the SWE-Smith benchmark. This includes dataset loading, Docker image handling, and task setup logic.

Environment Enhancements

Environment Selection: Updated select_env in debug_gym/gym/envs/__init__.py to include SWESmithEnv as a selectable environment.
Dataset Loading: Modified dataset loading in SWEBenchEnv to support dynamic splits.

Utility Improvements

Configuration Loading: Enhanced load_config in debug_gym/agents/utils.py to allow parameter extension using the extend action.
Container Setup: Updated container setup in debug_gym/gym/terminal.py to use network_mode="host" and simplified container renaming logic.

Bug Fixes

PDB Tool Output: Fixed output handling in use method of debug_gym/gym/tools/pdb.py to correctly adjust messages when the program exits via sys.exit().

Miscellaneous

Manifest Update: Included YAML configuration files in the package manifest (MANIFEST.in).
Imports: Added missing imports for DEBUG_GYM_CONFIG_DIR in debug_gym/init_llm_config.py.

Copilot

Pull Request Overview

This PR introduces support for a new environment (SWESmithEnv) and a new agent (AgentSolution) aimed at debugging tasks for the SWE-Smith dataset while also updating existing environments and terminal configurations. Key changes include:

Adding new utility functions in scripts/run.py for grouping and printing problems.
Introducing SWESmithEnv with Docker‐based repository cloning, image pulling, and custom task initialization.
Adding a new agent (AgentSolution) that applies a gold patch and then evaluates the task.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/run.py	Added functions for common prefix, grouping, pretty-printing and updated dataset selection logic.
scripts/config_swesmith.yaml	New YAML configuration for the SWE-Smith environment with minor comments.
debug_gym/utils.py	New constants for config and cache directories created at startup.
debug_gym/init_llm_config.py	Switched to using the new DEBUG_GYM_CONFIG_DIR constant.
debug_gym/gym/terminal.py	Updated container setup by adding network_mode and commenting out container renaming.
debug_gym/gym/envs/swe_smith.py	New environment with repository cloning, image pulling, dataset splitting and task setup.
debug_gym/gym/envs/swe_bench.py	Updated to use configurable dataset split instead of hardcoded “test”.
debug_gym/gym/envs/env.py	Modified entrypoint preparation logic to handle specific cases (e.g. “uv run”).
debug_gym/gym/envs/configs/swe_smith.yaml	New config file for task splits in the SWE-Smith environment.
debug_gym/gym/envs/init.py	Added import and environment selection case for “swesmith”.
debug_gym/agents/solution_agent.py	New debugging agent that applies a “gold patch” and evaluates its effect.
debug_gym/agents/init.py	Included the new AgentSolution in the agent registry.
MANIFEST.in	Updated to include YAML configuration files for environments.

Comments suppressed due to low confidence (2)

scripts/config_swesmith.yaml:6

There is a spelling mistake in the comment ('lillst' should be 'list'). Fixing this improves clarity.

# problems: "all"  # lillst of problems, e.g., ["spulec__freezegun.5f171db0.combine_file__f3rcc5ea"]

debug_gym/agents/solution_agent.py:38

[nitpick] The f-string already interpolates the variables, so the appended .format() call is redundant and may confuse readers. Remove the .format(info.score) part.

self.logger.info(f"Score: {info.score}/{info.max_score} ({info.score/info.max_score:.1%}) [Best: {highscore}]".format(info.score))

debug_gym/agents/solution_agent.py

debug_gym/gym/envs/env.py

Copilot

Pull Request Overview

This PR adds support for the SWE-Smith benchmark by introducing a new environment and agent, along with various utility and configuration updates.

Implement AgentSolution to apply gold patches and evaluate directly
Add SWESmithEnv with dataset loading, Docker handling, and split support
Enhance scripts, container setup, and config loading logic; fix PDB exit handling

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
scripts/run.py	Add problem‐grouping utilities; adjust dataset split logic
scripts/config_swesmith.yaml	New SWE-Smith config template
debug_gym/utils.py	Define and create cache/config directories at import
debug_gym/init_llm_config.py	Use DEBUG_GYM_CONFIG_DIR for default dest path
debug_gym/gym/tools/pdb.py	Fix PDB sys.exit() message handling
debug_gym/gym/terminal.py	Enable host networking; simplify container naming
debug_gym/gym/envs/swe_smith.py	New SWESmithEnv implementation
debug_gym/gym/envs/swe_bench.py	Update import paths and dynamic split support
debug_gym/gym/envs/env.py	Adjust entrypoint preparation for uv and python
debug_gym/gym/envs/init.py	Register SWESmithEnv in select_env
debug_gym/agents/utils.py	Allow argparse `extend` action for params
debug_gym/agents/solution_agent.py	Add solution agent to apply patches and eval
debug_gym/agents/init.py	Import AgentSolution
MANIFEST.in	Include YAML configs in package manifest

Comments suppressed due to low confidence (2)

debug_gym/gym/terminal.py:456

[nitpick] Remove commented-out code for renaming/reloading the container if it's no longer needed, to reduce clutter.

#        # container.rename(container_name)

debug_gym/agents/solution_agent.py:45

This f-string uses nested double quotes and will cause a syntax error. Use single quotes inside or escape the inner quotes, for example:
f"git -C {self.env.working_dir} apply {getattr(self.env, 'git_apply_args', '')} -".

command = f"git -C {self.env.working_dir} apply {getattr(self.env, "git_apply_args", "")} -"

scripts/run.py

MANIFEST.in

scripts/config_swesmith.yaml

debug_gym/utils.py

debug_gym/agents/utils.py

Copy from #122, thanks Marc :)

* Pdb set_current_frame_file * Removed unused PdbTool.pdb_obs * Keep output message when program exit in PDBTool. Copy from #122, thanks Marc :) * Clean up pytest path to not depend on the environment

Copy from #122, thanks Marc :)

* Pdb set_current_frame_file * Removed unused PdbTool.pdb_obs * Keep output message when program exit in PDBTool. Copy from #122, thanks Marc :) * Add additional tests for the PdbTool * Simplify pdb use logic * Remove set_current_frame_file since it's already set when start_pdb is called * Exec pdb commands then get breakpoints from pdb session * Fix some pdb tests * Fix tests * Refactor PDBTool to improve breakpoint handling First run commands, then get breakpoints from the pdb session * Removed unused breakpoint_add_clear * Added update_breakpoints docs * Use regex to parse breakpoints from pdb session * Add test for modifying breakpoints when none exist

debug_gym/gym/terminal.py

debug_gym/agents/solution_agent.py

debug_gym/gym/envs/env.py

debug_gym/agents/utils.py

debug_gym/gym/envs/configs/swe_smith.yaml

debug_gym/agents/solution_agent.py

debug_gym/gym/envs/swe_smith.py

scripts/config_swesmith.yaml

debug_gym/gym/envs/swe_bench.py

debug_gym/agents/solution_agent.py

…patch

xingdi-eric-yuan · 2025-06-11T15:44:22Z

Need a short tutorial on how to run official swe-smith tasks and customized tasks generated from swe-smith, this will include the use of the solution agent.

MarcCote changed the title ~~Adding SWE-Smith support~~ WIP: Adding SWE-Smith support May 22, 2025

MarcCote force-pushed the macote/swe-smith branch from d82ec42 to 73ed0fa Compare May 28, 2025 18:25

MarcCote mentioned this pull request May 28, 2025

Add support for SWESmith #131

Closed

MarcCote requested a review from Copilot May 28, 2025 18:37

Copilot AI reviewed May 28, 2025

View reviewed changes

debug_gym/agents/solution_agent.py Outdated Show resolved Hide resolved

debug_gym/gym/envs/env.py Outdated Show resolved Hide resolved

MarcCote force-pushed the macote/swe-smith branch 4 times, most recently from 0583f33 to 38fd4de Compare May 29, 2025 20:27

MarcCote requested a review from Copilot May 30, 2025 13:05

MarcCote force-pushed the macote/swe-smith branch from 6a02e78 to bfcc2e6 Compare May 30, 2025 13:05

Copilot AI reviewed May 30, 2025

View reviewed changes

scripts/run.py Show resolved Hide resolved

MANIFEST.in Show resolved Hide resolved

scripts/config_swesmith.yaml Show resolved Hide resolved

debug_gym/utils.py Outdated Show resolved Hide resolved

debug_gym/agents/utils.py Show resolved Hide resolved

matheper added a commit that referenced this pull request May 30, 2025

Keep output message when program exit in PDBTool.

405bcd8

Copy from #122, thanks Marc :)

matheper added a commit that referenced this pull request May 31, 2025

Keep output message when program exit in PDBTool.

3a30d50

Copy from #122, thanks Marc :)

MarcCote force-pushed the macote/swe-smith branch 2 times, most recently from db1e2d6 to e85aa65 Compare June 6, 2025 13:52

MarcCote commented Jun 6, 2025

View reviewed changes

debug_gym/gym/terminal.py Outdated Show resolved Hide resolved

MarcCote commented Jun 6, 2025

View reviewed changes

debug_gym/gym/terminal.py Show resolved Hide resolved

debug_gym/agents/solution_agent.py Show resolved Hide resolved

MarcCote force-pushed the macote/swe-smith branch 2 times, most recently from 0ac6a0c to 19da9a5 Compare June 6, 2025 14:43