Langchain Vector Search Tool uses MCP under the hood#295
Langchain Vector Search Tool uses MCP under the hood#295nisha2003 wants to merge 11 commits intomcp-migrationfrom
Conversation
aravind-segu
left a comment
There was a problem hiding this comment.
couple comments but looks good overall. Will stamp after E2E testing
| ) -> Dict[str, Any]: | ||
| """Build input for MCP tool invocation.""" | ||
| mcp_input = self._build_mcp_params(filters, **kwargs) | ||
| mcp_input["query"] = query |
There was a problem hiding this comment.
nit: why special case this? Can we pass it into _build_mcp_params
| query: str, | ||
| filters: Optional[Union[Dict[str, Any], List[FilterItem]]] = None, | ||
| **kwargs, | ||
| ) -> List[Document]: |
There was a problem hiding this comment.
was just curious here, before _run returns a str and now it returns a List[Document]. Did we change something or was the previous return type wrong
There was a problem hiding this comment.
I think it maybe have been wrong? Previously we just returned self._vector_store.similarity_search(**kwargs) for which the return type is List[Document].
| return filters | ||
| return {item.model_dump()["key"]: item.model_dump()["value"] for item in filters} | ||
|
|
||
| def _build_mcp_meta( |
There was a problem hiding this comment.
nit: use _build_mcp_params directly?
| """Build metadata dict for MCP tool invocation.""" | ||
| return self._build_mcp_params(filters, **kwargs) | ||
|
|
||
| def _parse_mcp_response(self, mcp_response: str) -> List[Dict]: |
There was a problem hiding this comment.
nit: same thing here, use the _parse_mcp_response_to_dicts directly?
| self, | ||
| query: str, | ||
| filters: Optional[Union[Dict[str, Any], List[FilterItem]]] = None, | ||
| openai_client: OpenAI = None, |
There was a problem hiding this comment.
Can we use the DatabricksOpenAI here to automatically authenticate with WorkspaceClient
| def _validate_mcp_tools(self, tools: list) -> None: | ||
| """Validate that MCP tools were returned.""" | ||
| if not tools: | ||
| raise ValueError(f"No MCP tools found for index {self.index_name}") |
There was a problem hiding this comment.
Can we add the other validation that the tools are of only length 1. We do that in open ai rn
| return [{"page_content": mcp_response, "metadata": {}}] | ||
|
|
||
| if not isinstance(parsed, list): | ||
| if strict: |
There was a problem hiding this comment.
nit: It looks like strict is always True, do we need this parameter?
| results = [] | ||
| for item in parsed: | ||
| if isinstance(item, dict): | ||
| page_content = item.get(text_col, str(item)) |
There was a problem hiding this comment.
here if text_col is not in the item we are stringifying it. Maybe we should throw a good error here instead? Are there cases where the text column wont exist?
| params: Dict[str, Any] = {} | ||
|
|
||
| if query is not None: | ||
| params["query"] = query |
There was a problem hiding this comment.
query is required in MCP Params. Lets not accept a none query
| if query is not None: | ||
| params["query"] = query | ||
|
|
||
| num_results = kwargs.pop("num_results", kwargs.pop("k", self.num_results)) |

Migrate Langchain Vector Search Tool to use MCP adapters. We still preserve the direct API path for self-managed embeddings (for which there is no MCP support).
Refactored some duplicate code for the MCP path between OpenAI and Langchain to the base mixin class. Moved tests to the mixin class for shared functionality.
Manual tests (https://eng-ml-inference.staging.cloud.databricks.com/editor/notebooks/1465545330011655?o=1653573648247579) in the Langchain Migration section