Skip to content

Add configurable default tool registry and built-in web_search tool#46

Open
weilixu wants to merge 2 commits intodevelopfrom
codex/implement-default-tool-system-with-web_search
Open

Add configurable default tool registry and built-in web_search tool#46
weilixu wants to merge 2 commits intodevelopfrom
codex/implement-default-tool-system-with-web_search

Conversation

@weilixu
Copy link
Contributor

@weilixu weilixu commented Feb 6, 2026

Motivation

  • Provide a first-class, declarative default tool system so agents can enable common tools via config (not MCP).
  • Ship a practical built-in tool web_search that supports paid providers, optional rerankers, and a robust open-source fallback.
  • Keep the base install lightweight and ensure tools are only constructed/bound when enabled in config.

Description

  • Add configuration models ToolSpec and ToolsConfig to declare tools: [{type, config}] and a small coercion helper (automa_ai/config/tools.py).
  • Implement an extensible tool interface and registry with BaseDefaultTool, RuntimeDeps, ToolRegistry, and build_langchain_tools to map type -> builder and adapt tools to LangChain usages (new automa_ai/tools/*).
  • Implement the web_search tool with provider selection (Serper or open-source DuckDuckGo fallback), optional Firecrawl scraping with OSS scraper fallback (trafilatura/BeautifulSoup), and reranking options (jina, cohere, or BM25 open-source fallback), including input/output schemas and meta timings/warnings (automa_ai/tools/web_search/*).
  • Integrate into runtime: AgentFactory accepts tools_config and coerces it into tool specs, and GenericLangGraphChatAgent constructs and binds LangChain tools only when tools_config is present, checking for duplicate tool names.
  • Add docs (docs/tools.md), a README snippet, an example script (examples/web_search_demo.py), optional extras in pyproject.toml ([project.optional-dependencies] web/rerank), and unit tests for config validation, provider selection, dedupe/max-pages scraping behavior, and BM25 deterministic ordering (tests/test_tools_config.py, tests/test_web_search_tool.py).

Testing

  • Formatting: ran black on modified files successfully; reformat completed without errors.
  • Unit tests added but pytest collection failed in this environment because runtime dependency pydantic was not available and network access to install packages was blocked, so tests could not be executed here.
  • Attempted to install missing deps (pydantic, httpx, pytest-asyncio) but the environment’s network/proxy prevented package installation, causing test run blockage.
  • Static compile check (python -m compileall) surfaced an unrelated pre-existing syntax issue in automa_ai/common/chunk.py that prevented full byte-compilation in this environment; this is not introduced by these changes.

Codex Task

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a configurable default tool registry system and implements a comprehensive built-in web_search tool for Automa-AI agents. The PR introduces a declarative configuration approach for tools that can be enabled via config (not MCP), with the web_search tool supporting multiple providers (Serper, DuckDuckGo), optional web scraping (Firecrawl or open-source), and result reranking (Jina, Cohere, or BM25).

Changes:

  • Adds configuration models (ToolSpec, ToolsConfig) for declarative tool configuration
  • Implements extensible tool infrastructure with BaseDefaultTool, RuntimeDeps, ToolRegistry, and LangChain adapter
  • Adds fully-featured web_search tool with provider fallbacks, scraping, and reranking capabilities
  • Integrates tools into AgentFactory and GenericLangGraphChatAgent with duplicate name detection
  • Includes comprehensive documentation, example script, unit tests, and optional dependencies

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
automa_ai/config/init.py New module init exporting ToolSpec and ToolsConfig
automa_ai/config/tools.py Configuration models for tool specifications
automa_ai/tools/init.py Registry initialization and tool exports
automa_ai/tools/base.py Base tool interface and LangChain adapter
automa_ai/tools/registry.py Extensible tool registry and builder system
automa_ai/tools/web_search/init.py Web search tool module exports
automa_ai/tools/web_search/config.py Configuration models for web search providers
automa_ai/tools/web_search/tool.py Main web search tool implementation
automa_ai/tools/web_search/providers.py Search providers (Serper, DuckDuckGo, Firecrawl)
automa_ai/tools/web_search/scraper.py Open-source scraper using trafilatura/BeautifulSoup
automa_ai/tools/web_search/rerank.py Reranking implementations (Jina, Cohere, BM25)
automa_ai/agents/agent_factory.py Tool config coercion and agent integration; Black formatting
automa_ai/agents/langgraph_chatagent.py Tool building and binding with duplicate detection; Black formatting
tests/test_tools_config.py Unit tests for configuration validation
tests/test_web_search_tool.py Unit tests for web search tool behavior
pyproject.toml Optional dependencies for web and rerank features
docs/tools.md Comprehensive documentation for default tools
README.md Configuration example and reference to docs
examples/web_search_demo.py Runnable example demonstrating tool usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



async def oss_scrape(client: httpx.AsyncClient, url: str, max_chars: int) -> str:
response = await client.get(url, follow_redirects=True)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oss_scrape function does not check the HTTP response status code before attempting to extract content. If the server returns an error status (4xx or 5xx), the function will still try to process the response text. This could lead to unexpected behavior or errors when trying to extract text from error pages. Consider adding status code validation and raising an appropriate exception for error responses.

Suggested change
response = await client.get(url, follow_redirects=True)
response = await client.get(url, follow_redirects=True)
response.raise_for_status()

Copilot uses AI. Check for mistakes.
"documents": docs,
"top_n": top_k,
},
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jina_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.

Suggested change
)
)
if resp.status_code == 401:
raise ValueError("Jina authentication failed: invalid or missing API key.")

Copilot uses AI. Check for mistakes.
"documents": docs,
"top_n": top_k,
},
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cohere_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.

Suggested change
)
)
if resp.status_code == 401:
raise ValueError(
"Cohere rerank authentication failed (HTTP 401). "
"Please check that your Cohere API key is valid and provided correctly."
)

Copilot uses AI. Check for mistakes.
pyproject.toml Outdated
Comment on lines 45 to 46
"httpx>=0.28.1",
"bs4>=0.0.2",
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The packages httpx and bs4 are already listed in the main dependencies (lines 12 and 17), so they don't need to be repeated in the web optional dependencies. This redundancy should be removed to keep the dependency specifications clean.

Suggested change
"httpx>=0.28.1",
"bs4>=0.0.2",

Copilot uses AI. Check for mistakes.
README.md Outdated
Comment on lines 106 to 107
### Retriever configuration

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "### Retriever configuration" heading appears without any content before the next section begins. Either add the missing retriever configuration content or remove this empty heading to avoid confusion.

Suggested change
### Retriever configuration

Copilot uses AI. Check for mistakes.

config = ToolsConfig.from_dict(_load_config(args.config))
tools = build_langchain_tools(config.tools)
web_search = next(t for t in tools if t.name == "web_search")
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next() generator expression will raise a StopIteration exception if no tool named "web_search" is found. Consider adding a default value or explicit error handling to provide a clearer error message to the user, for example: next((t for t in tools if t.name == "web_search"), None) followed by a check and clear error message.

Suggested change
web_search = next(t for t in tools if t.name == "web_search")
web_search = next((t for t in tools if t.name == "web_search"), None)
if web_search is None:
raise ValueError(
"No tool named 'web_search' was found. "
"Please ensure your tools configuration includes a 'web_search' tool."
)

Copilot uses AI. Check for mistakes.

class SerperConfig(BaseModel):
api_key: str | None = None
endpoint: str = "https://google.serper.dev/search"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SerperConfig.endpoint field is defined but never used. The serper_search function hardcodes the URL to "https://google.serper.dev/search" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.

Suggested change
endpoint: str = "https://google.serper.dev/search"

Copilot uses AI. Check for mistakes.

class FirecrawlConfig(BaseModel):
api_key: str | None = None
endpoint: str = "https://api.firecrawl.dev/v1/scrape"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FirecrawlConfig.endpoint field is defined but never used. The firecrawl_scrape function hardcodes the URL to "https://api.firecrawl.dev/v1/scrape" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.

Suggested change
endpoint: str = "https://api.firecrawl.dev/v1/scrape"

Copilot uses AI. Check for mistakes.
Comment on lines 5 to 9

try:
DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
except ValueError:
pass
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
try:
DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
except ValueError:
pass
import logging
try:
DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
except ValueError as exc:
logging.getLogger(__name__).debug(
"Ignoring ValueError while registering 'web_search' tool: %s",
exc,
)

Copilot uses AI. Check for mistakes.
text = await asyncio.to_thread(trafilatura.extract, html)
if text:
return text
except Exception:
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
except Exception:
# Fall back to BeautifulSoup if trafilatura is unavailable or extraction fails.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant