Add configurable default tool registry and built-in web_search tool by weilixu · Pull Request #46 · pnnl/BEM-AI

weilixu · 2026-02-06T05:50:59Z

Motivation

Provide a first-class, declarative default tool system so agents can enable common tools via config (not MCP).
Ship a practical built-in tool web_search that supports paid providers, optional rerankers, and a robust open-source fallback.
Keep the base install lightweight and ensure tools are only constructed/bound when enabled in config.

Description

Add configuration models ToolSpec and ToolsConfig to declare tools: [{type, config}] and a small coercion helper (automa_ai/config/tools.py).
Implement an extensible tool interface and registry with BaseDefaultTool, RuntimeDeps, ToolRegistry, and build_langchain_tools to map type -> builder and adapt tools to LangChain usages (new automa_ai/tools/*).
Implement the web_search tool with provider selection (Serper or open-source DuckDuckGo fallback), optional Firecrawl scraping with OSS scraper fallback (trafilatura/BeautifulSoup), and reranking options (jina, cohere, or BM25 open-source fallback), including input/output schemas and meta timings/warnings (automa_ai/tools/web_search/*).
Integrate into runtime: AgentFactory accepts tools_config and coerces it into tool specs, and GenericLangGraphChatAgent constructs and binds LangChain tools only when tools_config is present, checking for duplicate tool names.
Add docs (docs/tools.md), a README snippet, an example script (examples/web_search_demo.py), optional extras in pyproject.toml ([project.optional-dependencies] web/rerank), and unit tests for config validation, provider selection, dedupe/max-pages scraping behavior, and BM25 deterministic ordering (tests/test_tools_config.py, tests/test_web_search_tool.py).

Testing

Formatting: ran black on modified files successfully; reformat completed without errors.
Unit tests added but pytest collection failed in this environment because runtime dependency pydantic was not available and network access to install packages was blocked, so tests could not be executed here.
Attempted to install missing deps (pydantic, httpx, pytest-asyncio) but the environment’s network/proxy prevented package installation, causing test run blockage.
Static compile check (python -m compileall) surfaced an unrelated pre-existing syntax issue in automa_ai/common/chunk.py that prevented full byte-compilation in this environment; this is not introduced by these changes.

Codex Task

Copilot

Pull request overview

This pull request adds a configurable default tool registry system and implements a comprehensive built-in web_search tool for Automa-AI agents. The PR introduces a declarative configuration approach for tools that can be enabled via config (not MCP), with the web_search tool supporting multiple providers (Serper, DuckDuckGo), optional web scraping (Firecrawl or open-source), and result reranking (Jina, Cohere, or BM25).

Changes:

Adds configuration models (ToolSpec, ToolsConfig) for declarative tool configuration
Implements extensible tool infrastructure with BaseDefaultTool, RuntimeDeps, ToolRegistry, and LangChain adapter
Adds fully-featured web_search tool with provider fallbacks, scraping, and reranking capabilities
Integrates tools into AgentFactory and GenericLangGraphChatAgent with duplicate name detection
Includes comprehensive documentation, example script, unit tests, and optional dependencies

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
automa_ai/config/init.py	New module init exporting ToolSpec and ToolsConfig
automa_ai/config/tools.py	Configuration models for tool specifications
automa_ai/tools/init.py	Registry initialization and tool exports
automa_ai/tools/base.py	Base tool interface and LangChain adapter
automa_ai/tools/registry.py	Extensible tool registry and builder system
automa_ai/tools/web_search/init.py	Web search tool module exports
automa_ai/tools/web_search/config.py	Configuration models for web search providers
automa_ai/tools/web_search/tool.py	Main web search tool implementation
automa_ai/tools/web_search/providers.py	Search providers (Serper, DuckDuckGo, Firecrawl)
automa_ai/tools/web_search/scraper.py	Open-source scraper using trafilatura/BeautifulSoup
automa_ai/tools/web_search/rerank.py	Reranking implementations (Jina, Cohere, BM25)
automa_ai/agents/agent_factory.py	Tool config coercion and agent integration; Black formatting
automa_ai/agents/langgraph_chatagent.py	Tool building and binding with duplicate detection; Black formatting
tests/test_tools_config.py	Unit tests for configuration validation
tests/test_web_search_tool.py	Unit tests for web search tool behavior
pyproject.toml	Optional dependencies for web and rerank features
docs/tools.md	Comprehensive documentation for default tools
README.md	Configuration example and reference to docs
examples/web_search_demo.py	Runnable example demonstrating tool usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T06:02:35Z

automa_ai/tools/web_search/scraper.py

+
+
+async def oss_scrape(client: httpx.AsyncClient, url: str, max_chars: int) -> str:
+    response = await client.get(url, follow_redirects=True)


The oss_scrape function does not check the HTTP response status code before attempting to extract content. If the server returns an error status (4xx or 5xx), the function will still try to process the response text. This could lead to unexpected behavior or errors when trying to extract text from error pages. Consider adding status code validation and raising an appropriate exception for error responses.

Suggested change

response = await client.get(url, follow_redirects=True)

response = await client.get(url, follow_redirects=True)

response.raise_for_status()

Copilot · 2026-02-06T06:02:36Z

automa_ai/tools/web_search/rerank.py

+            "documents": docs,
+            "top_n": top_k,
+        },
+    )


The jina_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.

Suggested change

)

)

if resp.status_code == 401:

raise ValueError("Jina authentication failed: invalid or missing API key.")

Copilot · 2026-02-06T06:02:36Z

automa_ai/tools/web_search/rerank.py

+            "documents": docs,
+            "top_n": top_k,
+        },
+    )


The cohere_rerank function should handle 401 status codes explicitly with a ValueError (authentication failure) before the general error check, similar to how serper_search handles authentication failures. This provides clearer error messages to users when API keys are invalid or missing.

Suggested change

)

)

if resp.status_code == 401:

raise ValueError(

"Cohere rerank authentication failed (HTTP 401). "

"Please check that your Cohere API key is valid and provided correctly."

)

Copilot · 2026-02-06T06:02:36Z

pyproject.toml

+    "httpx>=0.28.1",
+    "bs4>=0.0.2",


The packages httpx and bs4 are already listed in the main dependencies (lines 12 and 17), so they don't need to be repeated in the web optional dependencies. This redundancy should be removed to keep the dependency specifications clean.

Suggested change

"httpx>=0.28.1",

"bs4>=0.0.2",

Copilot · 2026-02-06T06:02:36Z

README.md

 ### Retriever configuration



The "### Retriever configuration" heading appears without any content before the next section begins. Either add the missing retriever configuration content or remove this empty heading to avoid confusion.

Suggested change

### Retriever configuration

Copilot · 2026-02-06T06:02:37Z

examples/web_search_demo.py

+
+    config = ToolsConfig.from_dict(_load_config(args.config))
+    tools = build_langchain_tools(config.tools)
+    web_search = next(t for t in tools if t.name == "web_search")


The next() generator expression will raise a StopIteration exception if no tool named "web_search" is found. Consider adding a default value or explicit error handling to provide a clearer error message to the user, for example: next((t for t in tools if t.name == "web_search"), None) followed by a check and clear error message.

Suggested change

web_search = next(t for t in tools if t.name == "web_search")

web_search = next((t for t in tools if t.name == "web_search"), None)

if web_search is None:

raise ValueError(

"No tool named 'web_search' was found. "

"Please ensure your tools configuration includes a 'web_search' tool."

)

Copilot · 2026-02-06T06:02:37Z

automa_ai/tools/web_search/config.py

+
+class SerperConfig(BaseModel):
+    api_key: str | None = None
+    endpoint: str = "https://google.serper.dev/search"


The SerperConfig.endpoint field is defined but never used. The serper_search function hardcodes the URL to "https://google.serper.dev/search" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.

Suggested change

endpoint: str = "https://google.serper.dev/search"

Copilot · 2026-02-06T06:02:38Z

automa_ai/tools/web_search/config.py

+
+class FirecrawlConfig(BaseModel):
+    api_key: str | None = None
+    endpoint: str = "https://api.firecrawl.dev/v1/scrape"


The FirecrawlConfig.endpoint field is defined but never used. The firecrawl_scrape function hardcodes the URL to "https://api.firecrawl.dev/v1/scrape" instead of using the configurable endpoint. Either use the endpoint from config or remove the unused field to avoid confusion.

Suggested change

endpoint: str = "https://api.firecrawl.dev/v1/scrape"

Copilot · 2026-02-06T06:02:38Z

automa_ai/tools/__init__.py

+
+try:
+    DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
+except ValueError:
+    pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

try:

DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)

except ValueError:

pass

import logging

try:

DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)

except ValueError as exc:

logging.getLogger(__name__).debug(

"Ignoring ValueError while registering 'web_search' tool: %s",

exc,

)

Copilot · 2026-02-06T06:02:38Z

automa_ai/tools/web_search/scraper.py

+        text = await asyncio.to_thread(trafilatura.extract, html)
+        if text:
+            return text
+    except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

except Exception:

# Fall back to BeautifulSoup if trafilatura is unavailable or extraction fails.

Add configurable default tool registry with web_search

84a5ff3

weilixu added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

weilixu requested a review from Copilot February 6, 2026 05:52

Copilot started reviewing on behalf of weilixu February 6, 2026 05:53 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Address review feedback for web_search tool

6c4c54b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable default tool registry and built-in web_search tool#46

Add configurable default tool registry and built-in web_search tool#46
weilixu wants to merge 2 commits intodevelopfrom
codex/implement-default-tool-system-with-web_search

weilixu commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		async def oss_scrape(client: httpx.AsyncClient, url: str, max_chars: int) -> str:
		response = await client.get(url, follow_redirects=True)

	response = await client.get(url, follow_redirects=True)
	response = await client.get(url, follow_redirects=True)
	response.raise_for_status()

-    )
+    )
+    if resp.status_code == 401:
+        raise ValueError("Jina authentication failed: invalid or missing API key.")

-    web_search = next(t for t in tools if t.name == "web_search")
+    web_search = next((t for t in tools if t.name == "web_search"), None)
+    if web_search is None:
+        raise ValueError(
+            "No tool named 'web_search' was found. "
+            "Please ensure your tools configuration includes a 'web_search' tool."
+        )

-try:
-    DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
-except ValueError:
-    pass
+import logging
+try:
+    DEFAULT_TOOL_REGISTRY.register("web_search", build_web_search_tool)
+except ValueError as exc:
+    logging.getLogger(__name__).debug(
+        "Ignoring ValueError while registering 'web_search' tool: %s",
+        exc,
+    )

	except Exception:
	except Exception:
	# Fall back to BeautifulSoup if trafilatura is unavailable or extraction fails.

Conversation

weilixu commented Feb 6, 2026

Motivation

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant