-
Notifications
You must be signed in to change notification settings - Fork 6
Resolve judge code smell #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Refactors the judge CLI entrypoint into the judge package so it can be imported normally (improving testability and removing path-based import hacks).
Changes:
- Added
judge/cli.pycontainingget_parser()and asyncmain(args). - Converted root
judge.pyinto a thin entrypoint delegating tojudge.cli. - Updated pipeline + tests to import/patch
judge.clidirectly instead of loadingjudge.pyviaimportlib+ file paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
judge/cli.py |
New module that houses the judge CLI parser + async main logic. |
judge.py |
Now delegates to judge.cli and runs the async entrypoint. |
run_pipeline.py |
Imports judge.cli.main directly instead of loading judge.py by path. |
tests/unit/judge/test_judge_cli.py |
Uses normal imports from judge.cli and patches module objects directly. |
tests/integration/test_pipeline.py |
Patches judge.cli.main directly and removes importlib-based mocking. |
Comments suppressed due to low confidence (1)
run_pipeline.py:195
- The comment says imports are deferred "to allow --debug flag to be set", but
judge.cliis imported beforeset_debug(True)runs. Either move thegenerate_main/judge_mainimports to after the debug flag handling, or update the comment so it doesn’t claim behavior that isn’t true.
# Import generate and judge main functions
# We import here to avoid circular dependencies and to allow --debug flag to be set
from generate import main as generate_main
from judge.cli import main as judge_main
# Set debug mode if flag is provided
if args.debug:
from utils.debug import set_debug
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Default: temperature=0 (unless overridden)" | ||
| ), | ||
| type=parse_key_value_list, | ||
| default={}, |
Copilot
AI
Feb 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--judge-model-extra-params uses a mutable dict (default={}) as the argparse default. Because LLMJudge mutates this dict (e.g., sets temperature), subsequent parse_args() calls in the same process can inherit mutated defaults. Use default=None (and normalize to {} in main) or ensure a fresh dict per parse (e.g., copy).
| default={}, | |
| default=None, |
|
|
||
| # Load rubric configuration once at startup | ||
| print("📚 Loading rubric configuration...") | ||
| rubric_config = await RubricConfig.load(rubric_folder="data") |
Copilot
AI
Feb 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI exposes --rubrics (and run_pipeline.py passes rubrics=args.rubrics), but main() always loads RubricConfig from the hard-coded rubric_folder="data" and never uses args.rubrics. This makes the flag ineffective and can lead to confusing behavior. Either wire args.rubrics into RubricConfig.load (e.g., derive folder/filename from the provided path) or remove/rename the flag if it’s not supported.
| rubric_config = await RubricConfig.load(rubric_folder="data") | |
| rubric_folder = getattr(args, "rubrics", None) or "data" | |
| rubric_config = await RubricConfig.load(rubric_folder=rubric_folder) |
Description
Move the judge CLI (get_parser, main) from the root script into the package so tests can import it normally.
Resolves the concern that tests were hacking script location; tests now rely on normal imports.