-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add Job Application Automation Agent #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new job-automation agent package implementing a YAML-driven, multi-agent resume/job workflow: models, tools, flow orchestration, Stagehand browser automation, config/tasks, utilities, and a small CLI entrypoint plus packaging and environment example. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI / main.py
participant Flow as ResumeOptimizerFlow
participant Parser as Parser Agent
participant Fetcher as Job Fetcher Agent
participant Analyzer as Resume Analyzer Agent
participant Optimizer as Resume Optimizer Agent
participant Generator as Resume Text Generator
participant Submitter as Job Submitter Agent
participant Tools as Tools (Docx, Scraper, Stagehand)
CLI->>Flow: kickoff_async()
activate Flow
par Parse & Fetch
Flow->>Parser: parse_resume task
Parser->>Tools: read_docx_file()
Tools-->>Parser: ParsedResume
Parser-->>Flow: store parsed_resume
and
Flow->>Fetcher: fetch_job_description task
Fetcher->>Tools: scrape website
Tools-->>Fetcher: JobDescription
Fetcher-->>Flow: store job_description
end
Flow->>Analyzer: analyze_resume task (uses parsed_resume + job_description)
Analyzer-->>Flow: ResumeAnalysis
Flow->>Optimizer: optimize_resume task (uses analysis)
Optimizer-->>Flow: OptimizedResume
Flow->>Generator: generate_text_resume task
Generator-->>Flow: resume_text (file)
Flow->>Submitter: submit_job_application task
Submitter->>Tools: stagehand_browser_automation(website, profile, resume_text)
Tools-->>Submitter: submission_result
Submitter-->>Flow: store submission_result
Flow-->>CLI: ResumeOptimizerState (complete)
deactivate Flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (10)
job-automation-agent/.env.example (1)
1-3: Minor: Fix linting issues for consistency.The static analysis tool flagged two minor issues: keys should be alphabetically ordered and a trailing newline is missing.
🔎 Proposed fix
-MODEL_API_KEY= GOOGLE_API_KEY= +MODEL_API_KEY= OPENAI_API_KEY= +job-automation-agent/util.py (1)
7-11: Add error handling and fix return type.The function lacks error handling for missing files or YAML parse errors. Additionally,
yaml.safe_loadreturnsNonefor empty files, which doesn't match theDict[str, Any]return type.🔎 Proposed fix
+from typing import Any, Dict, Optional -from typing import Any, Dict import yaml -def load_yaml_config(config_path: str) -> Dict[str, Any]: +def load_yaml_config(config_path: str) -> Optional[Dict[str, Any]]: """Load YAML configuration file.""" config_file = Path(__file__).parent / config_path + if not config_file.exists(): + raise FileNotFoundError(f"Config file not found: {config_file}") with open(config_file, encoding="utf-8") as f: return yaml.safe_load(f)job-automation-agent/main.py (1)
10-11: Use explicitOptionaltype annotation.Per PEP 484, use explicit
Optional[list]orlist | Noneinstead of implicit optional.🔎 Proposed fix
+from typing import Optional + async def optimize_resume_async( resume_path: str, job_url: str, output_dir: str = "output", tone: str = "neutral", - must_keep_sections: list = None, + must_keep_sections: Optional[list] = None, ) -> ResumeOptimizerState:job-automation-agent/agent_tools.py (2)
20-27: Error handling pattern is acceptable for agent tools.While Ruff flags the broad
Exceptioncatch, returning an error string to the agent is a reasonable pattern here—it allows the LLM to understand and potentially recover from failures. Consider logging the error for debugging purposes.🔎 Optional: Add logging for debugging
+import logging + +logger = logging.getLogger(__name__) + @tool("Read DOCX Resume File") def read_docx_file(file_path: str) -> str: ... try: doc = docx.Document(file_path) full_text = [] for paragraph in doc.paragraphs: full_text.append(paragraph.text) return "\n".join(full_text) except Exception as e: + logger.exception("Failed to read DOCX file: %s", file_path) return f"Error reading docx file: {e}"
30-43: Add return type documentation to docstring.The function has a return type annotation but the docstring is missing a
Returns:section for completeness.job-automation-agent/config/tasks.yaml (1)
169-192: Consider adding error recovery and retry guidance for browser automation.The submit_job_application_task relies on automated browser interaction which is inherently fragile. Consider:
- Adding guidance for handling common failures (CAPTCHA, session timeouts, rate limiting)
- Specifying what constitutes a successful submission vs. partial completion
- Adding instructions to capture screenshots or logs for debugging failed submissions
Additionally, the
{profile}variable is referenced but its expected structure isn't documented in the task description—ensure the caller provides adequate profile data.job-automation-agent/flow.py (3)
52-56: Silent tool filtering may hide configuration errors.Tools referenced in YAML but not present in
TOOLS_MAPare silently ignored. This could mask configuration mistakes.🔎 Proposed fix to warn about missing tools
tools = [ TOOLS_MAP[tool_name] for tool_name in agent_config.get("tools", []) if tool_name in TOOLS_MAP ] + missing_tools = [ + tool_name + for tool_name in agent_config.get("tools", []) + if tool_name not in TOOLS_MAP + ] + if missing_tools: + print(f"⚠️ Warning: Unknown tools for agent '{agent_name}': {missing_tools}")
212-222: UseTypeErrorfor type validation and simplify redundant check.Per static analysis (TRY004),
TypeErroris more appropriate for invalid type checks. Also, the conditional on lines 218-222 is redundant since line 213 already raised ifanalysis_dataisn't aBaseModel.🔎 Proposed fix
- if not isinstance(self.state.analysis_data, BaseModel): - raise ValueError( + if not isinstance(self.state.analysis_data, ResumeAnalysis): + raise TypeError( f"analysis_data is not a valid ResumeAnalysis model. " f"Got type: {type(self.state.analysis_data)}, value: {self.state.analysis_data}" ) - analysis_str = ( - self.state.analysis_data.model_dump_json() - if self.state.analysis_data - else "{}" - ) + analysis_str = self.state.analysis_data.model_dump_json()
106-151: Consider extracting error messages to improve maintainability.The static analysis flagged multiple instances of long exception messages inline (TRY003). While the current approach works, extracting these to helper functions or constants would improve maintainability.
job-automation-agent/models.py (1)
98-107: Consider using inheritance to reduce duplication.
OptimizedResumehas identical fields toParsedResume. Using inheritance would reduce duplication and ensure consistency.🔎 Proposed refactor
-class OptimizedResume(BaseModel): - """Optimized resume data model (same structure as ParsedResume).""" - - contact_info: ContactInfo = Field(default_factory=ContactInfo) - summary: Optional[str] = None - skills: List[str] = Field(default_factory=list) - experience: List[WorkExperience] = Field(default_factory=list) - education: List[Education] = Field(default_factory=list) - projects: List[Project] = Field(default_factory=list) - certifications: List[Certification] = Field(default_factory=list) +class OptimizedResume(ParsedResume): + """Optimized resume data model (same structure as ParsedResume).""" + passAlternatively, if you anticipate
OptimizedResumediverging in the future (e.g., adding optimization metadata), keeping them separate is reasonable—just document that intent.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
job-automation-agent/software-engineer-resume.docxis excluded by!**/*.docxjob-automation-agent/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
job-automation-agent/.env.examplejob-automation-agent/agent_tools.pyjob-automation-agent/config/agents.yamljob-automation-agent/config/tasks.yamljob-automation-agent/flow.pyjob-automation-agent/job_automation.pyjob-automation-agent/main.pyjob-automation-agent/models.pyjob-automation-agent/pyproject.tomljob-automation-agent/util.py
🧰 Additional context used
🧬 Code graph analysis (3)
job-automation-agent/agent_tools.py (1)
job-automation-agent/job_automation.py (1)
stagehand_browser_automation(10-78)
job-automation-agent/flow.py (3)
job-automation-agent/agent_tools.py (2)
job_automation(31-43)read_docx_file(8-27)job-automation-agent/models.py (4)
JobDescription(65-77)OptimizedResume(98-107)ParsedResume(53-62)ResumeAnalysis(87-95)job-automation-agent/util.py (1)
load_yaml_config(7-11)
job-automation-agent/main.py (1)
job-automation-agent/flow.py (2)
ResumeOptimizerFlow(82-296)ResumeOptimizerState(22-46)
🪛 dotenv-linter (4.0.0)
job-automation-agent/.env.example
[warning] 2-2: [UnorderedKey] The GOOGLE_API_KEY key should go before the MODEL_API_KEY key
(UnorderedKey)
[warning] 3-3: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
🪛 Ruff (0.14.8)
job-automation-agent/agent_tools.py
26-26: Do not catch blind exception: Exception
(BLE001)
job-automation-agent/job_automation.py
66-66: Consider moving this statement to an else block
(TRY300)
69-69: Do not catch blind exception: Exception
(BLE001)
70-70: Use explicit conversion flag
Replace with conversion flag
(RUF010)
job-automation-agent/flow.py
45-45: Avoid specifying long messages outside the exception class
(TRY003)
131-133: Abstract raise to an inner function
(TRY301)
131-133: Avoid specifying long messages outside the exception class
(TRY003)
141-141: Abstract raise to an inner function
(TRY301)
149-149: Use explicit conversion flag
Replace with conversion flag
(RUF010)
213-216: Prefer TypeError exception for invalid type
(TRY004)
213-216: Avoid specifying long messages outside the exception class
(TRY003)
job-automation-agent/main.py
10-10: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🔇 Additional comments (9)
job-automation-agent/pyproject.toml (2)
1-6: LGTM! Project metadata is well-structured.The project metadata follows PEP 621 standards correctly. The Python 3.12+ requirement is modern and appropriate for a new project.
7-13: Consider version pinning for production stability.All specified dependency versions exist and are stable. The supported Python versions are >=3.10 and <=3.13, so Python 3.12 is compatible. nest-asyncio 1.6.0 has no known security issues, and python-docx requires Python >=3.9. python-docx 1.2.0 fixed the lxml pin that breaks Python 3.12 install. stagehand 0.5.7 supports Python 3.
While these minimum version constraints (>=) work for development, consider adding upper bounds or pinning specific versions for production environments to avoid unexpected breaking changes from transitive dependency updates.
job-automation-agent/config/agents.yaml (1)
1-75: LGTM!The agent configurations are well-structured with clear roles, goals, and backstories. Each agent has appropriate tool assignments and consistent settings.
job-automation-agent/job_automation.py (3)
50-56: Consider sanitizing profile data in prompt.The
profiledict andresume_descriptionare directly interpolated into the agent instruction. If these contain adversarial content, it could potentially manipulate the agent's behavior. While this is an internal tool, consider the source of this data.
66-70: Error handling is acceptable, but consider structured errors.The broad exception catch and string return is fine for agent tool responses. The Ruff suggestions (TRY300, RUF010) are stylistic improvements that don't affect functionality.
21-21: No action needed —openai/gpt-4.1is a valid model identifier.The model name follows Stagehand's standard format of
provider/model-nameand is documented as a valid OpenAI example in Stagehand's configuration guide. GPT-4.1 is a legitimate OpenAI model that excels at instruction following and tool calling.job-automation-agent/config/tasks.yaml (1)
1-22: Well-structured task with appropriate safety guardrails.The parse_resume_task includes important rules against fabricating information and handling missing data correctly with null values. This is good practice for AI-driven document parsing.
job-automation-agent/flow.py (1)
82-104: Flow initialization and agent setup looks well-structured.The agent mapping pattern cleanly separates YAML configuration from runtime agent instances. The use of
setattrallows dynamic agent creation while maintaining type safety through the state model.job-automation-agent/models.py (1)
1-107: Well-designed Pydantic models with sensible defaults.The models appropriately use:
Optionalfor fields that may be absentField(default_factory=list)for mutable list defaults- Required fields only where truly essential (company/title, institution, name)
This design allows graceful handling of incomplete resume data.
| with open("parsed_resume.json", "w") as f: | ||
| json.dump(self.state.parsed_resume_data.model_dump(), f, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded debug file path creates potential issues.
Writing to a hardcoded parsed_resume.json in the current directory:
- May cause conflicts if multiple flows run concurrently
- Appears to be debug/development code left in production
Consider removing this or writing to the configured output_dir.
🔎 Proposed fix
self._run_agent_task(
self.parser_agent,
"parse_resume_task",
"parsed_resume_data",
response_format=ParsedResume,
resume_path=self.state.resume_path,
)
- with open("parsed_resume.json", "w") as f:
- json.dump(self.state.parsed_resume_data.model_dump(), f, indent=4)
+ # Optionally save parsed resume for debugging
+ if self.state.output_dir:
+ debug_path = Path(self.state.output_dir) / "parsed_resume.json"
+ debug_path.parent.mkdir(parents=True, exist_ok=True)
+ with open(debug_path, "w", encoding="utf-8") as f:
+ json.dump(self.state.parsed_resume_data.model_dump(), f, indent=4)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with open("parsed_resume.json", "w") as f: | |
| json.dump(self.state.parsed_resume_data.model_dump(), f, indent=4) | |
| # Optionally save parsed resume for debugging | |
| if self.state.output_dir: | |
| debug_path = Path(self.state.output_dir) / "parsed_resume.json" | |
| debug_path.parent.mkdir(parents=True, exist_ok=True) | |
| with open(debug_path, "w", encoding="utf-8") as f: | |
| json.dump(self.state.parsed_resume_data.model_dump(), f, indent=4) |
🤖 Prompt for AI Agents
In job-automation-agent/flow.py around lines 164-165, the code writes a
hardcoded debug file "parsed_resume.json" to the current directory; replace this
debug write with writing into the configured output location (e.g.,
self.output_dir or self.config.output_dir) or remove it entirely. If keeping the
write, construct a path using the configured output_dir, create the directory if
missing, and use a unique filename (timestamp or UUID) to avoid concurrent run
conflicts; also wrap the write in a try/except and log any filesystem errors.
| self._run_agent_task( | ||
| self.submitter_agent, | ||
| "submit_job_application_task", | ||
| "submission_result", | ||
| response_format=None, # This task returns a status message, not structured | ||
| application_url=self.state.job_url, | ||
| profile=self.state.parsed_resume_data.model_dump_json(), | ||
| resume_text=resume_text, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's locate and examine the job_automation tool definition
find job-automation-agent -type f -name "*.py" | head -20Repository: patchy631/ai-engineering-hub
Length of output: 264
🏁 Script executed:
#!/bin/bash
# Check the structure and find the files mentioned in the review
ls -la job-automation-agent/ 2>/dev/null || echo "Directory not found, listing current directory"
find . -maxdepth 3 -name "agent_tools.py" -o -name "tasks.yaml" -o -name "flow.py" 2>/dev/null | head -20Repository: patchy631/ai-engineering-hub
Length of output: 1094
🏁 Script executed:
#!/bin/bash
# Check how profile is used in the job_automation tool and stagehand integration
rg -n -A 10 "def job_automation" job-automation-agent/
rg -n "stagehand_browser_automation" job-automation-agent/Repository: patchy631/ai-engineering-hub
Length of output: 1283
🏁 Script executed:
#!/bin/bash
# Check flow.py around lines 288-296
sed -n '280,305p' job-automation-agent/flow.py
# Check tasks.yaml around lines 176-177
sed -n '170,185p' job-automation-agent/config/tasks.yaml
# Check the full agent_tools.py file
cat job-automation-agent/agent_tools.py
# Check the stagehand_browser_automation function
sed -n '1,50p' job-automation-agent/job_automation.pyRepository: patchy631/ai-engineering-hub
Length of output: 4650
🏁 Script executed:
#!/bin/bash
# Get full job_automation.py to see how profile is used
cat job-automation-agent/job_automation.pyRepository: patchy631/ai-engineering-hub
Length of output: 3596
Fix type mismatch: profile parameter is passed as JSON string but function signature expects dict.
In flow.py line 289, profile is passed as self.state.parsed_resume_data.model_dump_json() (a JSON string). However, both agent_tools.py line 32 and job_automation.py line 11 declare profile: dict[str, Any]. The parameter is used directly in an f-string (line 49 of job_automation.py) without deserializing from JSON, which works at runtime but violates the type contract. Either remove model_dump_json() in flow.py to pass the dict directly, or update the type hints to accept str.
🤖 Prompt for AI Agents
In job-automation-agent/flow.py around lines 288–296, the profile argument is
passed as a JSON string via model_dump_json() which mismatches the expected
dict[str, Any] type; replace model_dump_json() with a dict-returning call (e.g.,
model_dump()) so profile is passed as a dict, or alternatively update the
downstream type hints (agent_tools.py and job_automation.py) to accept str if
you intentionally want to pass JSON — pick the first option (use model_dump() /
return a dict) to satisfy existing type annotations.
| config = StagehandConfig( | ||
| env="LOCAL", | ||
| model_name="openai/gpt-4.1", | ||
| system_prompt="You are a helpful assistant that can use a web browser to help users fill out job application forms.", | ||
| self_heal=True, | ||
| model_client_options={"apiKey": os.getenv("MODEL_API_KEY")}, | ||
| verbose=2, # 0 = errors only, 1 = info, 2 = debug | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate API keys before use.
os.getenv("MODEL_API_KEY") returns None if the environment variable is unset, which may cause cryptic errors during Stagehand initialization. Consider validating required environment variables early.
🔎 Proposed fix
def stagehand_browser_automation(
website_url: str, profile: dict[str, Any], resume_description: str
) -> str:
"""Used to perform job application tasks powered by Stagehand capabilities."""
+
+ model_api_key = os.getenv("MODEL_API_KEY")
+ google_api_key = os.getenv("GOOGLE_API_KEY")
+ if not model_api_key or not google_api_key:
+ return "Error: Missing required API keys (MODEL_API_KEY or GOOGLE_API_KEY)"
async def _execute_automation():
...
config = StagehandConfig(
...
- model_client_options={"apiKey": os.getenv("MODEL_API_KEY")},
+ model_client_options={"apiKey": model_api_key},
...
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In job-automation-agent/job_automation.py around lines 19 to 26, the code passes
os.getenv("MODEL_API_KEY") directly into StagehandConfig which can be None if
the env var is missing; retrieve the API key into a local variable first,
validate it (if not set, log a clear error and raise an exception or call
sys.exit(1) with a descriptive message), and only pass a non-empty key into
model_client_options so StagehandConfig is not initialized with a None value; do
not print the key value in logs, just an error indicating the missing
configuration.
| flow.state.resume_path = resume_path | ||
| flow.state.job_url = job_url | ||
| flow.state.output_dir = output_dir | ||
| flow.state.output_path = Path(output_dir) / "optimized_resume.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert Path to str for state assignment.
ResumeOptimizerState.output_path is typed as str, but Path(output_dir) / "optimized_resume.txt" returns a Path object. This may cause type validation issues with Pydantic.
🔎 Proposed fix
- flow.state.output_path = Path(output_dir) / "optimized_resume.txt"
+ flow.state.output_path = str(Path(output_dir) / "optimized_resume.txt")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| flow.state.output_path = Path(output_dir) / "optimized_resume.txt" | |
| flow.state.output_path = str(Path(output_dir) / "optimized_resume.txt") |
🤖 Prompt for AI Agents
In job-automation-agent/main.py around line 20, assign a string to
flow.state.output_path instead of a Path object: convert Path(output_dir) /
"optimized_resume.txt" to a str (e.g., wrap with str(...) or call .as_posix())
before assigning so it matches the ResumeOptimizerState typing and avoids
Pydantic validation issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
job-automation-agent/README.md (3)
58-58: Add hyphen to "AI-powered" adjective.When using multiple words as a compound adjective before a noun, use a hyphen to join them.
🔎 Proposed fix
-Running this command will start the CrewAI agentic workflow, which will handle the multi-agent orchestration for job application automation tasks using Stagehand as AI powered browser automation. +Running this command will start the CrewAI agentic workflow, which will handle the multi-agent orchestration for job application automation tasks using Stagehand as AI-powered browser automation.
30-30: Document the purpose of GOOGLE_API_KEY.The
.envfile requests aGOOGLE_API_KEY, but its purpose is not explained. Clarify what this key is used for in the workflow (e.g., job fetching, search, or other operations).🔎 Proposed enhancement
Consider adding a brief comment or documentation explaining which agent/tool uses this key, e.g.:
OPENAI_API_KEY=<your_openai_api_key> MODEL_API_KEY=<your_openai_api_key> GOOGLE_API_KEY=<your_google_api_key> # Used for job description fetchingAlternatively, add a note in the setup section explaining when/where this key is needed.
66-68: Refine the contribution section for a more professional tone.The contribution section contains multiple exclamation marks and uses a very common phrase. Consider toning down the enthusiasm and using more precise language.
🔎 Proposed refinement
-## Contribution +## Contributing -Contributions are welcome! Feel free to fork this repository and submit pull requests with your improvements. +Contributions are welcome. To contribute, fork this repository and submit a pull request with your improvements.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
job-automation-agent/README.md
🧰 Additional context used
🪛 LanguageTool
job-automation-agent/README.md
[style] ~11-~11: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ng it with the job requirements.
5. Resume Generation: A Generator Agent for...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[grammar] ~58-~58: Use a hyphen to join words.
Context: ...n automation tasks using Stagehand as AI powered browser automation.
📬 Sta...
(QB_NEW_EN_HYPHEN)
[style] ~68-~68: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2453 characters long)
Context: ...ontribution
Contributions are welcome! Feel free to fork this repository and s...
(EN_EXCESSIVE_EXCLAMATION)
[style] ~68-~68: Consider using a less common alternative to make your writing sound more unique and professional.
Context: ...tribution
Contributions are welcome! Feel free to fork this repository and submit pull re...
(FEEL_FREE_TO_STYLE_ME)
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.