Add configurable default tool registry and built-in web_search tool#46
Add configurable default tool registry and built-in web_search tool#46
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a configurable default tool registry system and implements a comprehensive built-in web_search tool for Automa-AI agents. The PR introduces a declarative configuration approach for tools that can be enabled via config (not MCP), with the web_search tool supporting multiple providers (Serper, DuckDuckGo), optional web scraping (Firecrawl or open-source), and result reranking (Jina, Cohere, or BM25).
Changes:
- Adds configuration models (
ToolSpec,ToolsConfig) for declarative tool configuration - Implements extensible tool infrastructure with
BaseDefaultTool,RuntimeDeps,ToolRegistry, and LangChain adapter - Adds fully-featured
web_searchtool with provider fallbacks, scraping, and reranking capabilities - Integrates tools into
AgentFactoryandGenericLangGraphChatAgentwith duplicate name detection - Includes comprehensive documentation, example script, unit tests, and optional dependencies
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| automa_ai/config/init.py | New module init exporting ToolSpec and ToolsConfig |
| automa_ai/config/tools.py | Configuration models for tool specifications |
| automa_ai/tools/init.py | Registry initialization and tool exports |
| automa_ai/tools/base.py | Base tool interface and LangChain adapter |
| automa_ai/tools/registry.py | Extensible tool registry and builder system |
| automa_ai/tools/web_search/init.py | Web search tool module exports |
| automa_ai/tools/web_search/config.py | Configuration models for web search providers |
| automa_ai/tools/web_search/tool.py | Main web search tool implementation |
| automa_ai/tools/web_search/providers.py | Search providers (Serper, DuckDuckGo, Firecrawl) |
| automa_ai/tools/web_search/scraper.py | Open-source scraper using trafilatura/BeautifulSoup |
| automa_ai/tools/web_search/rerank.py | Reranking implementations (Jina, Cohere, BM25) |
| automa_ai/agents/agent_factory.py | Tool config coercion and agent integration; Black formatting |
| automa_ai/agents/langgraph_chatagent.py | Tool building and binding with duplicate detection; Black formatting |
| tests/test_tools_config.py | Unit tests for configuration validation |
| tests/test_web_search_tool.py | Unit tests for web search tool behavior |
| pyproject.toml | Optional dependencies for web and rerank features |
| docs/tools.md | Comprehensive documentation for default tools |
| README.md | Configuration example and reference to docs |
| examples/web_search_demo.py | Runnable example demonstrating tool usage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| async def oss_scrape(client: httpx.AsyncClient, url: str, max_chars: int) -> str: | ||
| response = await client.get(url, follow_redirects=True) |
There was a problem hiding this comment.
The oss_scrape function does not check the HTTP response status code before attempting to extract content. If the server returns an error status (4xx or 5xx), the function will still try to process the response text. This could lead to unexpected behavior or errors when trying to extract text from error pages. Consider adding status code validation and raising an appropriate exception for error responses.
| response = await client.get(url, follow_redirects=True) | |
| response = await client.get(url, follow_redirects=True) | |
| response.raise_for_status() |
| "documents": docs, | ||
| "top_n": top_k, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The jina_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.
| ) | |
| ) | |
| if resp.status_code == 401: | |
| raise ValueError("Jina authentication failed: invalid or missing API key.") |
| "documents": docs, | ||
| "top_n": top_k, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The cohere_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.
| ) | |
| ) | |
| if resp.status_code == 401: | |
| raise ValueError( | |
| "Cohere rerank authentication failed (HTTP 401). " | |
| "Please check that your Cohere API key is valid and provided correctly." | |
| ) |
pyproject.toml
Outdated
| "httpx>=0.28.1", | ||
| "bs4>=0.0.2", |
There was a problem hiding this comment.
The packages httpx and bs4 are already listed in the main dependencies (lines 12 and 17), so they don't need to be repeated in the web optional dependencies. This redundancy should be removed to keep the dependency specifications clean.
| "httpx>=0.28.1", | |
| "bs4>=0.0.2", |
README.md
Outdated
| ### Retriever configuration | ||
|
|
There was a problem hiding this comment.
The "### Retriever configuration" heading appears without any content before the next section begins. Either add the missing retriever configuration content or remove this empty heading to avoid confusion.
| ### Retriever configuration |
|
|
||
| config = ToolsConfig.from_dict(_load_config(args.config)) | ||
| tools = build_langchain_tools(config.tools) | ||
| web_search = next(t for t in tools if t.name == "web_search") |
There was a problem hiding this comment.
The next() generator expression will raise a StopIteration exception if no tool named "web_search" is found. Consider adding a default value or explicit error handling to provide a clearer error message to the user, for example: next((t for t in tools if t.name == "web_search"), None) followed by a check and clear error message.
| web_search = next(t for t in tools if t.name == "web_search") | |
| web_search = next((t for t in tools if t.name == "web_search"), None) | |
| if web_search is None: | |
| raise ValueError( | |
| "No tool named 'web_search' was found. " | |
| "Please ensure your tools configuration includes a 'web_search' tool." | |
| ) |
|
|
||
| class SerperConfig(BaseModel): | ||
| api_key: str | None = None | ||
| endpoint: str = "https://google.serper.dev/search" |
There was a problem hiding this comment.
The SerperConfig.endpoint field is defined but never used. The serper_search function hardcodes the URL to "https://google.serper.dev/search" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.
| endpoint: str = "https://google.serper.dev/search" |
|
|
||
| class FirecrawlConfig(BaseModel): | ||
| api_key: str | None = None | ||
| endpoint: str = "https://api.firecrawl.dev/v1/scrape" |
There was a problem hiding this comment.
The FirecrawlConfig.endpoint field is defined but never used. The firecrawl_scrape function hardcodes the URL to "https://api.firecrawl.dev/v1/scrape" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.
| endpoint: str = "https://api.firecrawl.dev/v1/scrape" |
|
|
||
| try: | ||
| DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool) | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| try: | |
| DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool) | |
| except ValueError: | |
| pass | |
| import logging | |
| try: | |
| DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool) | |
| except ValueError as exc: | |
| logging.getLogger(__name__).debug( | |
| "Ignoring ValueError while registering 'web_search' tool: %s", | |
| exc, | |
| ) |
| text = await asyncio.to_thread(trafilatura.extract, html) | ||
| if text: | ||
| return text | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Fall back to BeautifulSoup if trafilatura is unavailable or extraction fails. |
Motivation
web_searchthat supports paid providers, optional rerankers, and a robust open-source fallback.Description
ToolSpecandToolsConfigto declaretools: [{type, config}]and a small coercion helper (automa_ai/config/tools.py).BaseDefaultTool,RuntimeDeps,ToolRegistry, andbuild_langchain_toolsto maptype -> builderand adapt tools to LangChain usages (newautoma_ai/tools/*).web_searchtool with provider selection (Serper or open-source DuckDuckGo fallback), optional Firecrawl scraping with OSS scraper fallback (trafilatura/BeautifulSoup), and reranking options (jina,cohere, or BM25 open-source fallback), including input/output schemas and meta timings/warnings (automa_ai/tools/web_search/*).AgentFactoryacceptstools_configand coerces it into tool specs, andGenericLangGraphChatAgentconstructs and binds LangChain tools only whentools_configis present, checking for duplicate tool names.docs/tools.md), a README snippet, an example script (examples/web_search_demo.py), optional extras inpyproject.toml([project.optional-dependencies]web/rerank), and unit tests for config validation, provider selection, dedupe/max-pages scraping behavior, and BM25 deterministic ordering (tests/test_tools_config.py,tests/test_web_search_tool.py).Testing
blackon modified files successfully; reformat completed without errors.pytestcollection failed in this environment because runtime dependencypydanticwas not available and network access to install packages was blocked, so tests could not be executed here.pydantic,httpx,pytest-asyncio) but the environment’s network/proxy prevented package installation, causing test run blockage.python -m compileall) surfaced an unrelated pre-existing syntax issue inautoma_ai/common/chunk.pythat prevented full byte-compilation in this environment; this is not introduced by these changes.Codex Task