Skip to content

Conversation

@MarcCote
Copy link
Collaborator

@MarcCote MarcCote commented May 22, 2025

This pull request introduces several enhancements and bug fixes to the debug_gym repository, including the addition of a new agent class, support for a new environment, and various utility improvements. Key changes include the implementation of the AgentSolution class, the addition of the SWESmithEnv environment, and updates to configuration loading and container setup logic.

New Features

  • New Agent Class: Introduced AgentSolution, a new agent class in debug_gym/agents/solution_agent.py, which includes methods for debugging tasks, applying patches, and evaluating results.
  • New Environment: Added SWESmithEnv in debug_gym/gym/envs/swe_smith.py, which supports the SWE-Smith benchmark. This includes dataset loading, Docker image handling, and task setup logic.

Environment Enhancements

  • Environment Selection: Updated select_env in debug_gym/gym/envs/__init__.py to include SWESmithEnv as a selectable environment.
  • Dataset Loading: Modified dataset loading in SWEBenchEnv to support dynamic splits.

Utility Improvements

  • Configuration Loading: Enhanced load_config in debug_gym/agents/utils.py to allow parameter extension using the extend action.
  • Container Setup: Updated container setup in debug_gym/gym/terminal.py to use network_mode="host" and simplified container renaming logic.

Bug Fixes

  • PDB Tool Output: Fixed output handling in use method of debug_gym/gym/tools/pdb.py to correctly adjust messages when the program exits via sys.exit().

Miscellaneous

  • Manifest Update: Included YAML configuration files in the package manifest (MANIFEST.in).
  • Imports: Added missing imports for DEBUG_GYM_CONFIG_DIR in debug_gym/init_llm_config.py.

@MarcCote MarcCote changed the title Adding SWE-Smith support WIP: Adding SWE-Smith support May 22, 2025
@MarcCote MarcCote force-pushed the macote/swe-smith branch from d82ec42 to 73ed0fa Compare May 28, 2025 18:25
@MarcCote MarcCote requested a review from Copilot May 28, 2025 18:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for a new environment (SWESmithEnv) and a new agent (AgentSolution) aimed at debugging tasks for the SWE-Smith dataset while also updating existing environments and terminal configurations. Key changes include:

  • Adding new utility functions in scripts/run.py for grouping and printing problems.
  • Introducing SWESmithEnv with Docker‐based repository cloning, image pulling, and custom task initialization.
  • Adding a new agent (AgentSolution) that applies a gold patch and then evaluates the task.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/run.py Added functions for common prefix, grouping, pretty-printing and updated dataset selection logic.
scripts/config_swesmith.yaml New YAML configuration for the SWE-Smith environment with minor comments.
debug_gym/utils.py New constants for config and cache directories created at startup.
debug_gym/init_llm_config.py Switched to using the new DEBUG_GYM_CONFIG_DIR constant.
debug_gym/gym/terminal.py Updated container setup by adding network_mode and commenting out container renaming.
debug_gym/gym/envs/swe_smith.py New environment with repository cloning, image pulling, dataset splitting and task setup.
debug_gym/gym/envs/swe_bench.py Updated to use configurable dataset split instead of hardcoded “test”.
debug_gym/gym/envs/env.py Modified entrypoint preparation logic to handle specific cases (e.g. “uv run”).
debug_gym/gym/envs/configs/swe_smith.yaml New config file for task splits in the SWE-Smith environment.
debug_gym/gym/envs/init.py Added import and environment selection case for “swesmith”.
debug_gym/agents/solution_agent.py New debugging agent that applies a “gold patch” and evaluates its effect.
debug_gym/agents/init.py Included the new AgentSolution in the agent registry.
MANIFEST.in Updated to include YAML configuration files for environments.
Comments suppressed due to low confidence (2)

scripts/config_swesmith.yaml:6

  • There is a spelling mistake in the comment ('lillst' should be 'list'). Fixing this improves clarity.
# problems: "all"  # lillst of problems, e.g., ["spulec__freezegun.5f171db0.combine_file__f3rcc5ea"]

debug_gym/agents/solution_agent.py:38

  • [nitpick] The f-string already interpolates the variables, so the appended .format() call is redundant and may confuse readers. Remove the .format(info.score) part.
self.logger.info(f"Score: {info.score}/{info.max_score} ({info.score/info.max_score:.1%}) [Best: {highscore}]".format(info.score))

@MarcCote MarcCote force-pushed the macote/swe-smith branch 4 times, most recently from 0583f33 to 38fd4de Compare May 29, 2025 20:27
@MarcCote MarcCote requested a review from Copilot May 30, 2025 13:05
@MarcCote MarcCote force-pushed the macote/swe-smith branch from 6a02e78 to bfcc2e6 Compare May 30, 2025 13:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the SWE-Smith benchmark by introducing a new environment and agent, along with various utility and configuration updates.

  • Implement AgentSolution to apply gold patches and evaluate directly
  • Add SWESmithEnv with dataset loading, Docker handling, and split support
  • Enhance scripts, container setup, and config loading logic; fix PDB exit handling

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/run.py Add problem‐grouping utilities; adjust dataset split logic
scripts/config_swesmith.yaml New SWE-Smith config template
debug_gym/utils.py Define and create cache/config directories at import
debug_gym/init_llm_config.py Use DEBUG_GYM_CONFIG_DIR for default dest path
debug_gym/gym/tools/pdb.py Fix PDB sys.exit() message handling
debug_gym/gym/terminal.py Enable host networking; simplify container naming
debug_gym/gym/envs/swe_smith.py New SWESmithEnv implementation
debug_gym/gym/envs/swe_bench.py Update import paths and dynamic split support
debug_gym/gym/envs/env.py Adjust entrypoint preparation for uv and python
debug_gym/gym/envs/init.py Register SWESmithEnv in select_env
debug_gym/agents/utils.py Allow argparse extend action for params
debug_gym/agents/solution_agent.py Add solution agent to apply patches and eval
debug_gym/agents/init.py Import AgentSolution
MANIFEST.in Include YAML configs in package manifest
Comments suppressed due to low confidence (2)

debug_gym/gym/terminal.py:456

  • [nitpick] Remove commented-out code for renaming/reloading the container if it's no longer needed, to reduce clutter.
#        # container.rename(container_name)

debug_gym/agents/solution_agent.py:45

  • This f-string uses nested double quotes and will cause a syntax error. Use single quotes inside or escape the inner quotes, for example:
    f"git -C {self.env.working_dir} apply {getattr(self.env, 'git_apply_args', '')} -".
command = f"git -C {self.env.working_dir} apply {getattr(self.env, "git_apply_args", "")} -"

matheper added a commit that referenced this pull request May 30, 2025
matheper added a commit that referenced this pull request May 30, 2025
* Pdb set_current_frame_file

* Removed unused PdbTool.pdb_obs

* Keep output message when program exit in PDBTool.
Copy from #122, thanks Marc :)

* Clean up pytest path to not depend on the environment
matheper added a commit that referenced this pull request May 31, 2025
matheper added a commit that referenced this pull request Jun 2, 2025
* Pdb set_current_frame_file

* Removed unused PdbTool.pdb_obs

* Keep output message when program exit in PDBTool.
Copy from #122, thanks Marc :)

* Add additional tests for the PdbTool

* Simplify pdb use logic

* Remove set_current_frame_file since it's already set when start_pdb is called

* Exec pdb commands then get breakpoints from pdb session

* Fix some pdb tests

* Fix tests

* Refactor PDBTool to improve breakpoint handling
First run commands, then get breakpoints from the pdb session

* Removed unused breakpoint_add_clear

* Added update_breakpoints docs

* Use regex to parse breakpoints from pdb session

* Add test for modifying breakpoints when none exist
@MarcCote MarcCote force-pushed the macote/swe-smith branch 2 times, most recently from db1e2d6 to e85aa65 Compare June 6, 2025 13:52
@MarcCote MarcCote force-pushed the macote/swe-smith branch 2 times, most recently from 0ac6a0c to 19da9a5 Compare June 6, 2025 14:43
@MarcCote MarcCote force-pushed the macote/swe-smith branch 2 times, most recently from 1d222f5 to e8f070e Compare June 6, 2025 16:51
@MarcCote MarcCote force-pushed the macote/swe-smith branch from dd567a1 to 6341af9 Compare June 9, 2025 22:09
@xingdi-eric-yuan
Copy link
Collaborator

Need a short tutorial on how to run official swe-smith tasks and customized tasks generated from swe-smith, this will include the use of the solution agent.

@matheper matheper changed the title WIP: Adding SWE-Smith support Adding SWE-Smith support Jun 11, 2025
@xingdi-eric-yuan xingdi-eric-yuan merged commit 31558a3 into main Jun 11, 2025
6 checks passed
@xingdi-eric-yuan xingdi-eric-yuan deleted the macote/swe-smith branch June 11, 2025 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants