fix: harden URL encoding and input validation for AI/LLM callers#21
fix: harden URL encoding and input validation for AI/LLM callers#21
Conversation
🦋 Changeset detectedLatest commit: 494bfb4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors how URL query parameters are handled across the application. By migrating from custom, manual URL encoding functions to the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors URL encoding by replacing manual, incomplete encoding with reqwest's .query() builder, significantly improving code safety and maintainability. However, the removal of the manual urlencoded function has introduced regressions where path parameters (like calendar_id and cal.id) are now formatted directly into URL strings without encoding. This introduces potential path traversal and injection vulnerabilities. Specifically, two instances of path parameters are now inserted without encoding, which could cause errors for certain inputs, such as calendar IDs that are email addresses.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a great refactoring that replaces manual, incomplete URL encoding with reqwest's .query() builder for query parameters and the percent-encoding crate for path segments. This improves the code's correctness and maintainability, and also fixes several latent bugs where path parameters were not being encoded. The changes are well-implemented across calendar.rs and workflows.rs. I've added a few suggestions to improve error handling in cases where API call failures are currently being silently ignored, which can improve the user experience by providing more context when things go wrong.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly improves URL handling correctness and security by refactoring to use reqwest's query builder and the percent-encoding crate, introducing encode_path_segment and validate_resource_name for better separation of concerns. However, the validate_resource_name function introduces a vulnerability by failing to perform percent-encoding for multi-segment resource names, allowing URL injection via special characters like ? and #. This is critical for LLM-intended tools, as it could lead to prompt manipulation. To address this, validate_resource_name should be made more robust by disallowing these special URL characters.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors manual URL encoding to use reqwest's query builder, which is a significant improvement for safety and robustness. It also introduces and applies input validation helpers in src/validate.rs, hardening the CLI against path traversal and injection attacks, and includes helpful documentation updates in AGENTS.md. However, a critical security vulnerability remains: several user-provided resource names (project, subscription, topic) in the events +subscribe and gmail +watch commands are still used directly in URL construction without proper validation. These unvalidated inputs could lead to path traversal or other injection attacks. It is recommended to validate these resource names using the crate::helpers::validate_resource_name helper. Additionally, there is a minor suggestion to remove a redundant validation check.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the CLI against malformed and adversarial inputs, which is an excellent improvement, especially given its intended use with AI/LLM agents. The move from manual URL encoding to using reqwest::query() and the new percent-encoding crate is a major step up in security and correctness. The introduction of a centralized validate.rs module with path and input validation helpers is also a great architectural choice. My review focuses on ensuring these new validation mechanisms are used safely and consistently.
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the CLI against adversarial inputs by improving URL encoding and adding robust input validation. It introduces shared helpers for URL encoding and path validation, refactors existing code to use them, and adds comprehensive documentation and tests for these new security features. My review focuses on further strengthening these security measures and improving code clarity. I've suggested a more robust path traversal check, pointed out an opportunity to improve documentation consistency, and identified a duplicated struct that should be removed to avoid confusion.
Note: Security Review did not run due to the size of the PR.
83b35d2 to
1b8f527
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the CLI against adversarial inputs, which is a crucial improvement given its intended use with AI/LLM agents. The changes are comprehensive, covering URL encoding, path validation, and resource name validation, including the move from manual URL encoding to reqwest.query() and the introduction of shared helpers like encode_path_segment and validate_resource_name within a dedicated validate.rs module. However, two medium-severity vulnerabilities were identified: a potential path traversal in the Gmail watch output handler due to unvalidated message IDs, and a bypass of the resource name validation logic via URL encoding. Additionally, there is one minor suggestion to improve the accuracy of the documentation. Addressing these points will further strengthen the application's resilience against injection-style attacks.
Remove duplicate hand-rolled urlencoded() functions from workflows.rs and calendar.rs. All query parameters are now passed via reqwest's .query() API, which handles percent-encoding correctly and completely.
Use percent_encoding::utf8_percent_encode for calendar_id, cal.id, message_id, and file_id before interpolating into URL path segments. Addresses code review feedback on security regression.
Add encode_path_segment() for single-segment IDs and validate_resource_name() for multi-segment resource names. encode_path_segment: percent-encodes all non-alphanumeric chars, used for calendar IDs, file IDs, and message IDs. validate_resource_name: rejects path traversal (..) and control chars while preserving intentional / structure, used for Chat space names, task list IDs, and subscription names. Returns clear error messages for LLM callers.
Cover query/fragment injection, double-encoding, unicode, spaces, path traversal via encoding, control chars (CR/tab), and clear error message assertions for LLM callers.
- Daily briefing calendar events fetch - Daily briefing tasks fetch - Daily summary calendar events fetch - Daily summary unread email count fetch Addresses PR review feedback about confusing silent failures, especially for LLM callers that cannot see visual cues.
- Add src/validate.rs with validate_safe_output_dir, validate_msg_format, and validate_safe_dir_path helpers - Validate --output-dir against path traversal in gmail +watch and events +subscribe - Validate --msg-format against allowlist in gmail +watch - Validate --dir against path traversal in script +push - Add clap value_parser constraint for --msg-format - Document input validation patterns in AGENTS.md Closes #23
…and CONTRIBUTING.md
…ate subscription arg, remove redundant validate_msg_format
…cribeConfig Addresses review comments: - Store validated PathBuf from validate_safe_output_dir instead of discarding it (output_dir is now Option<PathBuf>) - Remove duplicate SubscribeConfig from events/mod.rs - Delete unused validate_msg_format (clap value_parser handles this) - Remove all #[allow(dead_code)] annotations
2d1c8bc to
494bfb4
Compare
This PR hardens the CLI against injection-style attacks by strengthening URL encoding, path traversal prevention, and resource name validation. It also centralizes all input validation logic.