feat(langchain): Update Gemini + Chroma RAG example notebook#1074
feat(langchain): Update Gemini + Chroma RAG example notebook#1074Ashitpatel001 wants to merge 11 commits intogoogle-gemini:mainfrom
Conversation
Adds a new notebook demonstrating how to build a RAG pipeline using Google Gemini, LangChain (LCEL), and ChromaDB. Key features: - Uses modern LangChain v0.1+ imports (langchain-core, langchain-community). - Securely handles API keys mapping `GEMINI_API_KEY` to `GOOGLE_API_KEY`. - Replaces legacy chains with LCEL syntax for the retrieval pipeline.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary of ChangesHello @Ashitpatel001, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive new example notebook that showcases a Retrieval Augmented Generation (RAG) pipeline. It leverages Google Gemini for language generation, LangChain for orchestrating the RAG process, and ChromaDB as the vector store for efficient document retrieval. The update modernizes the example by adopting the latest LangChain Expression Language (LCEL) and includes robust API key management, offering a valuable alternative to existing DeepLake-based RAG examples. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the RAG example notebook to use ChromaDB and modern LangChain (LCEL) syntax, which is a valuable update. The code is cleaner and uses more current practices.
However, my review identified two critical issues that will prevent the notebook from running correctly: a missing beautifulsoup4 dependency and incorrect API key handling that will lead to authentication failure. I have also provided several medium-severity suggestions to align the notebook with the repository's style guide, covering topics like redundant UI elements, correct use of @param, and removing unused code. Addressing these points will significantly improve the notebook's quality and reliability.
| "%pip install --quiet -U langchain-community==0.0.20\n", | ||
| "%pip install --quiet chromadb\n", | ||
| "%pip install --quiet bs4" | ||
| "%pip --quiet install langchain langchain-community chromadb langchain-core langchain-google-genai" |
There was a problem hiding this comment.
WebBaseLoader depends on the beautifulsoup4 library for parsing web pages. This dependency was present in the old installation commands but is missing from this new consolidated command, which will cause a runtime ImportError. Please add it back.
%pip --quiet install langchain langchain-community chromadb langchain-core langchain-google-genai beautifulsoup4
| "try:\n", | ||
| " # Get the key from Colab Secrets (named 'GEMINI_API_KEY' per instructions)\n", | ||
| " gemini_key = userdata.get('GEMINI_API_KEY')\n", | ||
| " os.environ[\"GEMINI_API_KEY\"] = gemini_key\n", | ||
| " print(\"API Key loaded successfully.\")\n", | ||
| "except Exception as e:\n", | ||
| " print(\"Error: Please make sure you have created a Secret named 'GEMINI_API_KEY'.\")" | ||
| ] |
There was a problem hiding this comment.
This change will cause authentication to fail. The langchain-google-genai library expects the API key to be in the GOOGLE_API_KEY environment variable, not GEMINI_API_KEY. The instructions in the markdown cell just above this one, as well as the Authentication.ipynb quickstart, also state that the secret should be named GOOGLE_API_KEY. Please use GOOGLE_API_KEY for both the secret name and the environment variable.
try:
# Get the key from Colab Secrets (named 'GOOGLE_API_KEY' per instructions)
api_key = userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = api_key
print("API Key loaded successfully.")
except Exception as e:
print("Error: Please make sure you have created a Secret named 'GOOGLE_API_KEY'.")
| { | ||
| "cell_type": "markdown", | ||
| "metadata": { | ||
| "id": "view-in-github", | ||
| "colab_type": "text" | ||
| }, | ||
| "source": [ | ||
| "<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
This 'Open in Colab' badge is redundant and misplaced. According to the repository style guide (line 50), the badge should be placed immediately after the H1 title. Another badge already exists in the correct location (at line 59 of the file). Please remove this newly added cell to avoid confusion.
References
- The style guide specifies that the 'Open in Colab' badge should be placed immediately after the H1 header for consistency across notebooks. (link)
| "from langchain_google_genai import GoogleGenerativeAIEmbeddings\n", | ||
| "\n", | ||
| "gemini_embeddings = GoogleGenerativeAIEmbeddings(model=\"models/gemini-embedding-001\")" | ||
| "gemini_embeddings = GoogleGenerativeAIEmbeddings(model=\"models/gemini-embedding-001\") # @param" |
There was a problem hiding this comment.
The # @param comment here is not used correctly. It will turn the entire line into an editable text field in Colab, which is not the intended use for a non-configurable value. The style guide (lines 72-77) reserves @param for user-configurable values, typically with a dropdown. Since the embedding model is fixed in this notebook, please remove the # @param.
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
References
- The style guide demonstrates using
@paramto create interactive form elements like dropdowns for model selection, not for static code lines. (link)
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from ast import Param\n", |
| "# \"temperature\" parameter directly.\n", | ||
| "\n", | ||
| "llm = ChatGoogleGenerativeAI(model=\"gemini-2.5-flash\")" | ||
| "model = \"gemini-2.0-flash\" # @param [\"gemini-2.5-flash\", \"gemini-2.0-flash\"]\n", |
There was a problem hiding this comment.
There are a few issues with this model selection parameter that violate the repository style guide:
- Naming: The variable should be named
MODEL_IDfor consistency with the style guide (line 137 for constants, line 73 for an example). - Parameter format: The
@paramis missing the JSON configuration{"allow-input":true, "isTemplate": true}which is recommended by the style guide (line 73) for easier maintenance. - Model Names: The model names
gemini-2.0-flashandgemini-2.5-flashappear to be internal or outdated. Please use current, publicly available model names likegemini-1.5-flash-latestto ensure the notebook works for external users.
MODEL_ID = "gemini-1.5-flash-latest" # @param ["gemini-1.5-flash-latest", "gemini-1.5-pro-latest"] {"allow-input": true, "isTemplate": true}
References
- The style guide provides a specific format for model selection parameters, including the variable name
MODEL_IDand a JSON configuration to make it a user-editable dropdown. (link)
| { | ||
| "cell_type": "code", | ||
| "execution_count": 13, | ||
| "execution_count": null, |
There was a problem hiding this comment.
While cleaning up the notebook is good, the repository style guide (line 42) recommends keeping cell outputs so users can see the expected results without running the code. Since the outputs here are not excessively large, please consider re-running the notebook and committing it with the outputs included.
References
- The style guide states a preference for including cell outputs in committed notebooks to allow readers to see the results without executing the code themselves, unless the outputs are very large. (link)
Removed unnecessary metadata and links from markdown cells.
Removed Colab badge and associated metadata from the notebook.
Fixes #1065
This PR adds a new notebook example (
examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb) demonstrating how to build a Retrieval Augmented Generation (RAG) pipeline using Google Gemini, LangChain, and ChromaDB.This serves as a Chroma-based alternative to the existing DeepLake examples, utilizing modern LangChain (LCEL) syntax and the latest
langchain-google-genaiintegration.Key Changes
gemini-2.5-flash(via a Colab parameter dropdown for easy selection).GEMINI_API_KEY(user secret) toGEMINI_API_KEY(library requirement).Checklist
langchain-chromaandgoogle-generativeai.google.colab.userdata.