Skip to content

Conversation

@lyarwood
Copy link
Contributor

@lyarwood lyarwood commented Nov 5, 2025

Adds support for using gemini-cli as an agent for evaluations with proper MCP tool call recording. The integration uses a wrapper script to dynamically configure gemini with the MCP proxy server, ensuring tool calls are properly intercepted and recorded for assertion validation.

Summary by CodeRabbit

  • New Features
    • Added Gemini CLI agent for running AI-powered evaluations
    • Introduced evaluation configuration for Kubernetes basic operations testing
    • Expanded agent ecosystem with support for Gemini CLI integration in evaluation workflows

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Support for a Gemini CLI agent is added to the system. A new agent type GeminiCLIAgent is implemented, registered in the builtin agent registry, and configured with validation and default specification methods. An evaluation configuration file is added to demonstrate its use with Kubernetes MCP server tests.

Changes

Cohort / File(s) Change Summary
Configuration and Ignore Patterns
.gitignore
Added ignore patterns for Gemini CLI: .gemini directory and dedicated Gemini-specific section following existing agent entries.
Evaluation Configuration
examples/kube-mcp-server/gemini/eval.yaml
New evaluation job configuration for testing Kubernetes basic operations using the Gemini CLI agent with MCP server integration, task-set glob patterns, and assertion rules.
Agent Registry
pkg/agent/builtin.go
Registered new "gemini-cli" agent type mapping to GeminiCLIAgent in the builtinTypes registry without modifying existing type-retrieval logic.
Agent Implementation
pkg/agent/gemini_cli.go
Implemented GeminiCLIAgent type with exported methods: Name(), Description(), RequiresModel(), ValidateEnvironment() (checks for gemini binary in PATH), and GetDefaults() (returns AgentSpec with MCP server and tool invocation configuration).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

  • Pay special attention to the shell pipeline construction in GetDefaults() method, particularly the temporary MCP server setup and cleanup logic.
  • Verify the YAML structure and field mappings in eval.yaml align with the evaluation job schema.
  • Confirm the ValidateEnvironment() check for the gemini binary is robust and provides clear error messages.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding gemini-cli as a new agent type with MCP integration, which is reflected in all modified files (.gitignore, new eval.yaml, builtin.go registry, and new gemini_cli.go implementation).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lyarwood lyarwood marked this pull request as draft November 5, 2025 18:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
examples/kube-mcp-server/gemini/gemini-agent-wrapper.sh (3)

13-13: Verify URL extraction handles JSON variations gracefully. The grep+sed regex assumes a specific JSON format with "url": "value". If the MCP config format varies (e.g., different whitespace, URL without quotes, or comments), extraction may silently fail or extract incorrect values. Consider using a JSON parser like jq if available, or add validation that the extracted URL is not empty before using it.

If a JSON parser is available in the environment, consider replacing the regex-based extraction:

-URL=$(grep -o '"url"[[:space:]]*:[[:space:]]*"[^"]*"' "$CONFIG_FILE" | head -1 | sed 's/.*"url"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/')
+URL=$(jq -r '.url' "$CONFIG_FILE" 2>/dev/null || grep -o '"url"[[:space:]]*:[[:space:]]*"[^"]*"' "$CONFIG_FILE" | head -1 | sed 's/.*"url"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/')

Otherwise, add validation that URL matches expected pattern:

-if [ -z "$URL" ]; then
+if [ -z "$URL" ] || ! [[ "$URL" =~ ^https?:// ]]; then

28-28: Suppress stderr explicitly when mcp add might fail silently. Line 28 redirects output to /dev/null 2>&1, which hides error messages. If the gemini command is missing or the MCP server is unreachable, users won't see helpful error context. Consider capturing stderr separately for debugging or logging failed attempts. At minimum, ensure the error message on line 16-18 is clear about what to check.

Consider logging the command execution:

-gemini mcp add "$SERVER_NAME" "$URL" --scope project --transport http --trust >/dev/null 2>&1
+if ! gemini mcp add "$SERVER_NAME" "$URL" --scope project --transport http --trust 2>/tmp/gemini-mcp-add-error.log; then
+  echo "Warning: Failed to add MCP server. Check /tmp/gemini-mcp-add-error.log" >&2
+fi

36-46: Reduce duplication in gemini invocation. The two code paths (lines 36-41 and 43-46) are nearly identical except for the --allowed-tools flag. Consider building the command as an array to reduce duplication and improve maintainability.

Apply this refactor to reduce duplication:

-if [ -n "$ALLOWED_TOOLS" ]; then
-	gemini --allowed-mcp-server-names "$SERVER_NAME" \
-		--allowed-tools "$ALLOWED_TOOLS" \
-		--approval-mode yolo \
-		--output-format text \
-		--prompt "$PROMPT"
-else
-	gemini --allowed-mcp-server-names "$SERVER_NAME" \
-		--approval-mode yolo \
-		--output-format text \
-		--prompt "$PROMPT"
-fi
+declare -a GEMINI_ARGS=(
+  --allowed-mcp-server-names "$SERVER_NAME"
+  --approval-mode yolo
+  --output-format text
+  --prompt "$PROMPT"
+)
+
+if [ -n "$ALLOWED_TOOLS" ]; then
+  GEMINI_ARGS+=(--allowed-tools "$ALLOWED_TOOLS")
+fi
+
+gemini "${GEMINI_ARGS[@]}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3ee9bc and 1d6c0b5.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • examples/kube-mcp-server/gemini/agent.yaml (1 hunks)
  • examples/kube-mcp-server/gemini/eval.yaml (1 hunks)
  • examples/kube-mcp-server/gemini/gemini-agent-wrapper.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
examples/kube-mcp-server/gemini/gemini-agent-wrapper.sh

[warning] 31-31: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (3)
.gitignore (1)

38-39: LGTM! The pattern correctly excludes Gemini CLI-generated artifacts from version control, consistent with the repository's existing structure.

examples/kube-mcp-server/gemini/eval.yaml (1)

6-6: Verify that referenced config files and task directories exist. Lines 6 and 13 reference ../mcp-config.yaml and ../tasks/*/*.yaml which should exist at runtime. Ensure these files are documented in setup instructions or created as part of the deployment.

Also applies to: 13-13

examples/kube-mcp-server/gemini/agent.yaml (1)

9-10: Document the required working directory context or refactor path resolution. The relative paths in runPrompt assume execution from the repository root. While the current implementation works when invoked as documented (./gevals eval examples/kube-mcp-server/gemini/eval.yaml), this creates a brittle implicit dependency on the working directory.

Recommendation: Either document that examples must run from the repository root, or refactor the framework to support working directory configuration/resolution that allows scripts to be run from any location. The suggested fix in the original comment (using $(dirname "$0")) won't work here since paths are embedded in YAML, not scripts. Consider making gevals change to the appropriate directory when loading configurations or providing an explicit working directory parameter.

gemini mcp add "$SERVER_NAME" "$URL" --scope project --transport http --trust >/dev/null 2>&1

# Ensure cleanup on exit (success or failure)
trap "gemini mcp remove '$SERVER_NAME' >/dev/null 2>&1 || true" EXIT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix trap expansion timing per SC2064 warning. Use single quotes around the trap command to defer variable expansion until the trap fires, not when it's set. This prevents issues if the variable is modified after the trap is registered (though unlikely here).

Apply this diff to fix the trap expansion:

-trap "gemini mcp remove '$SERVER_NAME' >/dev/null 2>&1 || true" EXIT
+trap 'gemini mcp remove "$SERVER_NAME" >/dev/null 2>&1 || true' EXIT
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
trap "gemini mcp remove '$SERVER_NAME' >/dev/null 2>&1 || true" EXIT
trap 'gemini mcp remove "$SERVER_NAME" >/dev/null 2>&1 || true' EXIT
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 31-31: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🤖 Prompt for AI Agents
In examples/kube-mcp-server/gemini/gemini-agent-wrapper.sh around line 31, the
trap command currently expands $SERVER_NAME when the trap is set; change the
trap to use single quotes so variable expansion is deferred until the trap runs.
Replace the double-quoted trap command with a single-quoted one (ensuring any
single quotes inside are escaped or avoided) so the shell stores the literal
command and expands $SERVER_NAME only at EXIT time.

@lyarwood
Copy link
Contributor Author

lyarwood commented Nov 5, 2025

I'll clean this up later in the week but I wonder if a generic ACP bridge would be better for the project?

@Cali0707
Copy link
Contributor

I'll clean this up later in the week but I wonder if a generic ACP bridge would be better for the project?

I love the idea of ACP @lyarwood! To start, maybe we can clean this up and get it in (or maybe include the gemini agent as one of the built in agents you are providing in #38 ), and then we can work to include ACP support after? WDYT?

ArgTemplateMcpServer: "{{ .URL }}",
ArgTemplateAllowedTools: "{{ .ToolName }}",
AllowedToolsJoinSeparator: &separator,
RunPrompt: `sh -c 'SERVER_NAME="mcp-eval-$$" && gemini mcp add "$SERVER_NAME" {{ .McpServerURLArgs }} --scope project --transport http --trust >/dev/null 2>&1 && trap "gemini mcp remove \"$SERVER_NAME\" >/dev/null 2>&1 || true" EXIT && gemini --allowed-mcp-server-names "$SERVER_NAME" --allowed-tools "{{ .AllowedToolArgs }}" --approval-mode yolo --output-format text --prompt "{{ .Prompt }}"'`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awful so thoughts on improving this would be awful while we wait for the builtin PR to land.

Adds built-in support for gemini-cli as an agent type, eliminating the need
for wrapper scripts. The integration uses gemini's native MCP configuration
commands to dynamically set up and tear down MCP servers for each evaluation,
ensuring tool calls are properly intercepted and recorded.

This provides a cleaner, more maintainable approach compared to external
wrapper scripts while maintaining full MCP proxy functionality.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
@lyarwood lyarwood marked this pull request as ready for review November 20, 2025 16:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/agent/gemini_cli.go (1)

22-27: Consider wrapping the exec.LookPath error for easier diagnostics

The environment check is correct, but returning a bare “not found” message drops the original error context. Wrapping the error would make debugging PATH/config issues easier.

You could do something like:

func (a *GeminiCLIAgent) ValidateEnvironment() error {
-	if _, err := exec.LookPath("gemini"); err != nil {
-		return fmt.Errorf("'gemini' binary not found in PATH")
-	}
+	if _, err := exec.LookPath("gemini"); err != nil {
+		return fmt.Errorf("'gemini' binary not found in PATH: %w", err)
+	}
 	return nil
}

Please confirm this still matches your BuiltinAgent interface expectations and Go version in use.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d6c0b5 and 78bbf9e.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • examples/kube-mcp-server/gemini/eval.yaml (1 hunks)
  • pkg/agent/builtin.go (1 hunks)
  • pkg/agent/gemini_cli.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/kube-mcp-server/gemini/eval.yaml
  • .gitignore
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/agent/builtin.go (1)
pkg/agent/gemini_cli.go (1)
  • GeminiCLIAgent (8-8)
pkg/agent/gemini_cli.go (1)
pkg/agent/config.go (3)
  • AgentSpec (15-19)
  • AgentMetadata (36-42)
  • AgentCommands (44-71)
🔇 Additional comments (2)
pkg/agent/builtin.go (1)

3-7: Builtin registration for gemini-cli looks consistent

The new "gemini-cli" entry matches the existing pattern for builtin agents and aligns with GeminiCLIAgent’s Name(); no functional issues here.

pkg/agent/gemini_cli.go (1)

8-20: Agent identity and basic metadata are coherent

GeminiCLIAgent’s Name, Description, and RequiresModel methods are straightforward and align with how a self‑managed CLI should behave; nothing blocking here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants