-
Notifications
You must be signed in to change notification settings - Fork 37
Adding SWE-Smith support #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d82ec42 to
73ed0fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for a new environment (SWESmithEnv) and a new agent (AgentSolution) aimed at debugging tasks for the SWE-Smith dataset while also updating existing environments and terminal configurations. Key changes include:
- Adding new utility functions in scripts/run.py for grouping and printing problems.
- Introducing SWESmithEnv with Docker‐based repository cloning, image pulling, and custom task initialization.
- Adding a new agent (AgentSolution) that applies a gold patch and then evaluates the task.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/run.py | Added functions for common prefix, grouping, pretty-printing and updated dataset selection logic. |
| scripts/config_swesmith.yaml | New YAML configuration for the SWE-Smith environment with minor comments. |
| debug_gym/utils.py | New constants for config and cache directories created at startup. |
| debug_gym/init_llm_config.py | Switched to using the new DEBUG_GYM_CONFIG_DIR constant. |
| debug_gym/gym/terminal.py | Updated container setup by adding network_mode and commenting out container renaming. |
| debug_gym/gym/envs/swe_smith.py | New environment with repository cloning, image pulling, dataset splitting and task setup. |
| debug_gym/gym/envs/swe_bench.py | Updated to use configurable dataset split instead of hardcoded “test”. |
| debug_gym/gym/envs/env.py | Modified entrypoint preparation logic to handle specific cases (e.g. “uv run”). |
| debug_gym/gym/envs/configs/swe_smith.yaml | New config file for task splits in the SWE-Smith environment. |
| debug_gym/gym/envs/init.py | Added import and environment selection case for “swesmith”. |
| debug_gym/agents/solution_agent.py | New debugging agent that applies a “gold patch” and evaluates its effect. |
| debug_gym/agents/init.py | Included the new AgentSolution in the agent registry. |
| MANIFEST.in | Updated to include YAML configuration files for environments. |
Comments suppressed due to low confidence (2)
scripts/config_swesmith.yaml:6
- There is a spelling mistake in the comment ('lillst' should be 'list'). Fixing this improves clarity.
# problems: "all" # lillst of problems, e.g., ["spulec__freezegun.5f171db0.combine_file__f3rcc5ea"]
debug_gym/agents/solution_agent.py:38
- [nitpick] The f-string already interpolates the variables, so the appended .format() call is redundant and may confuse readers. Remove the .format(info.score) part.
self.logger.info(f"Score: {info.score}/{info.max_score} ({info.score/info.max_score:.1%}) [Best: {highscore}]".format(info.score))
0583f33 to
38fd4de
Compare
6a02e78 to
bfcc2e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the SWE-Smith benchmark by introducing a new environment and agent, along with various utility and configuration updates.
- Implement AgentSolution to apply gold patches and evaluate directly
- Add SWESmithEnv with dataset loading, Docker handling, and split support
- Enhance scripts, container setup, and config loading logic; fix PDB exit handling
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/run.py | Add problem‐grouping utilities; adjust dataset split logic |
| scripts/config_swesmith.yaml | New SWE-Smith config template |
| debug_gym/utils.py | Define and create cache/config directories at import |
| debug_gym/init_llm_config.py | Use DEBUG_GYM_CONFIG_DIR for default dest path |
| debug_gym/gym/tools/pdb.py | Fix PDB sys.exit() message handling |
| debug_gym/gym/terminal.py | Enable host networking; simplify container naming |
| debug_gym/gym/envs/swe_smith.py | New SWESmithEnv implementation |
| debug_gym/gym/envs/swe_bench.py | Update import paths and dynamic split support |
| debug_gym/gym/envs/env.py | Adjust entrypoint preparation for uv and python |
| debug_gym/gym/envs/init.py | Register SWESmithEnv in select_env |
| debug_gym/agents/utils.py | Allow argparse extend action for params |
| debug_gym/agents/solution_agent.py | Add solution agent to apply patches and eval |
| debug_gym/agents/init.py | Import AgentSolution |
| MANIFEST.in | Include YAML configs in package manifest |
Comments suppressed due to low confidence (2)
debug_gym/gym/terminal.py:456
- [nitpick] Remove commented-out code for renaming/reloading the container if it's no longer needed, to reduce clutter.
# # container.rename(container_name)
debug_gym/agents/solution_agent.py:45
- This f-string uses nested double quotes and will cause a syntax error. Use single quotes inside or escape the inner quotes, for example:
f"git -C {self.env.working_dir} apply {getattr(self.env, 'git_apply_args', '')} -".
command = f"git -C {self.env.working_dir} apply {getattr(self.env, "git_apply_args", "")} -"
Copy from #122, thanks Marc :)
* Pdb set_current_frame_file * Removed unused PdbTool.pdb_obs * Keep output message when program exit in PDBTool. Copy from #122, thanks Marc :) * Clean up pytest path to not depend on the environment
Copy from #122, thanks Marc :)
* Pdb set_current_frame_file * Removed unused PdbTool.pdb_obs * Keep output message when program exit in PDBTool. Copy from #122, thanks Marc :) * Add additional tests for the PdbTool * Simplify pdb use logic * Remove set_current_frame_file since it's already set when start_pdb is called * Exec pdb commands then get breakpoints from pdb session * Fix some pdb tests * Fix tests * Refactor PDBTool to improve breakpoint handling First run commands, then get breakpoints from the pdb session * Removed unused breakpoint_add_clear * Added update_breakpoints docs * Use regex to parse breakpoints from pdb session * Add test for modifying breakpoints when none exist
db1e2d6 to
e85aa65
Compare
0ac6a0c to
19da9a5
Compare
1d222f5 to
e8f070e
Compare
dd567a1 to
6341af9
Compare
ef48916 to
e2aef27
Compare
|
Need a short tutorial on how to run official swe-smith tasks and customized tasks generated from swe-smith, this will include the use of the solution agent. |
989101a to
c751555
Compare
This pull request introduces several enhancements and bug fixes to the
debug_gymrepository, including the addition of a new agent class, support for a new environment, and various utility improvements. Key changes include the implementation of theAgentSolutionclass, the addition of theSWESmithEnvenvironment, and updates to configuration loading and container setup logic.New Features
AgentSolution, a new agent class indebug_gym/agents/solution_agent.py, which includes methods for debugging tasks, applying patches, and evaluating results.SWESmithEnvindebug_gym/gym/envs/swe_smith.py, which supports the SWE-Smith benchmark. This includes dataset loading, Docker image handling, and task setup logic.Environment Enhancements
select_envindebug_gym/gym/envs/__init__.pyto includeSWESmithEnvas a selectable environment.SWEBenchEnvto support dynamic splits.Utility Improvements
load_configindebug_gym/agents/utils.pyto allow parameter extension using theextendaction.debug_gym/gym/terminal.pyto usenetwork_mode="host"and simplified container renaming logic.Bug Fixes
usemethod ofdebug_gym/gym/tools/pdb.pyto correctly adjust messages when the program exits viasys.exit().Miscellaneous
MANIFEST.in).DEBUG_GYM_CONFIG_DIRindebug_gym/init_llm_config.py.