Skip to content

Conversation

@jgieringer
Copy link
Collaborator

Description

Move the judge CLI (get_parser, main) from the root script into the package so tests can import it normally.

  • Add judge/cli.py with get_parser() and async main(args) (unchanged behavior).
  • Slim judge.py to a thin entrypoint that delegates to judge.cli.
  • Update tests/unit/judge/test_judge_cli.py to use from judge.cli import get_parser, main and drop the importlib + Path(file).parents[3] hack.
  • Update run_pipeline.py to use from judge.cli import main as judge_main instead of loading the script by path.
  • Update tests/integration/test_pipeline.py to patch judge.cli.main instead of importlib.util.spec_from_file_location / module_from_spec.

Resolves the concern that tests were hacking script location; tests now rely on normal imports.

@jgieringer jgieringer changed the base branch from main to jgieringer/unit-testing February 7, 2026 01:28
@jgieringer jgieringer requested review from Copilot and removed request for Copilot February 7, 2026 01:28
@jgieringer jgieringer mentioned this pull request Feb 7, 2026
Base automatically changed from jgieringer/unit-testing to main February 7, 2026 01:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the judge CLI entrypoint into the judge package so it can be imported normally (improving testability and removing path-based import hacks).

Changes:

  • Added judge/cli.py containing get_parser() and async main(args).
  • Converted root judge.py into a thin entrypoint delegating to judge.cli.
  • Updated pipeline + tests to import/patch judge.cli directly instead of loading judge.py via importlib + file paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
judge/cli.py New module that houses the judge CLI parser + async main logic.
judge.py Now delegates to judge.cli and runs the async entrypoint.
run_pipeline.py Imports judge.cli.main directly instead of loading judge.py by path.
tests/unit/judge/test_judge_cli.py Uses normal imports from judge.cli and patches module objects directly.
tests/integration/test_pipeline.py Patches judge.cli.main directly and removes importlib-based mocking.
Comments suppressed due to low confidence (1)

run_pipeline.py:195

  • The comment says imports are deferred "to allow --debug flag to be set", but judge.cli is imported before set_debug(True) runs. Either move the generate_main/judge_main imports to after the debug flag handling, or update the comment so it doesn’t claim behavior that isn’t true.
    # Import generate and judge main functions
    # We import here to avoid circular dependencies and to allow --debug flag to be set
    from generate import main as generate_main
    from judge.cli import main as judge_main

    # Set debug mode if flag is provided
    if args.debug:
        from utils.debug import set_debug


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"Default: temperature=0 (unless overridden)"
),
type=parse_key_value_list,
default={},
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--judge-model-extra-params uses a mutable dict (default={}) as the argparse default. Because LLMJudge mutates this dict (e.g., sets temperature), subsequent parse_args() calls in the same process can inherit mutated defaults. Use default=None (and normalize to {} in main) or ensure a fresh dict per parse (e.g., copy).

Suggested change
default={},
default=None,

Copilot uses AI. Check for mistakes.

# Load rubric configuration once at startup
print("📚 Loading rubric configuration...")
rubric_config = await RubricConfig.load(rubric_folder="data")
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI exposes --rubrics (and run_pipeline.py passes rubrics=args.rubrics), but main() always loads RubricConfig from the hard-coded rubric_folder="data" and never uses args.rubrics. This makes the flag ineffective and can lead to confusing behavior. Either wire args.rubrics into RubricConfig.load (e.g., derive folder/filename from the provided path) or remove/rename the flag if it’s not supported.

Suggested change
rubric_config = await RubricConfig.load(rubric_folder="data")
rubric_folder = getattr(args, "rubrics", None) or "data"
rubric_config = await RubricConfig.load(rubric_folder=rubric_folder)

Copilot uses AI. Check for mistakes.
@jgieringer jgieringer mentioned this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant