Skip to content

feat(langchain): Update Gemini + Chroma RAG example notebook#1074

Open
Ashitpatel001 wants to merge 11 commits intogoogle-gemini:mainfrom
Ashitpatel001:Chroma_using_gemini_with_langchain
Open

feat(langchain): Update Gemini + Chroma RAG example notebook#1074
Ashitpatel001 wants to merge 11 commits intogoogle-gemini:mainfrom
Ashitpatel001:Chroma_using_gemini_with_langchain

Conversation

@Ashitpatel001
Copy link

@Ashitpatel001 Ashitpatel001 commented Dec 19, 2025

Fixes #1065

This PR adds a new notebook example (examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb) demonstrating how to build a Retrieval Augmented Generation (RAG) pipeline using Google Gemini, LangChain, and ChromaDB.

This serves as a Chroma-based alternative to the existing DeepLake examples, utilizing modern LangChain (LCEL) syntax and the latest langchain-google-genai integration.

Key Changes

  • Model: Implements gemini-2.5-flash (via a Colab parameter dropdown for easy selection).
  • Vector Store: Integrated ChromaDB for document storage and retrieval.
  • Security: Implemented secure API key handling, mapping GEMINI_API_KEY (user secret) to GEMINI_API_KEY (library requirement).
  • Modern Syntax: Uses LangChain Expression Language (LCEL) for the retrieval chain logic.

Checklist

  • The notebook runs successfully from start to finish.
  • Output cells are included (to demonstrate results).
  • Verified compatibility with langchain-chroma and google-generativeai.
  • API keys are handled securely using google.colab.userdata.

Adds a new notebook demonstrating how to build a RAG pipeline using Google Gemini, LangChain (LCEL), and ChromaDB.

Key features:
- Uses modern LangChain v0.1+ imports (langchain-core, langchain-community).
- Securely handles API keys mapping `GEMINI_API_KEY` to `GOOGLE_API_KEY`.
- Replaces legacy chains with LCEL syntax for the retrieval pipeline.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added status:awaiting review PR awaiting review from a maintainer component:examples Issues/PR referencing examples folder labels Dec 19, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Ashitpatel001, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive new example notebook that showcases a Retrieval Augmented Generation (RAG) pipeline. It leverages Google Gemini for language generation, LangChain for orchestrating the RAG process, and ChromaDB as the vector store for efficient document retrieval. The update modernizes the example by adopting the latest LangChain Expression Language (LCEL) and includes robust API key management, offering a valuable alternative to existing DeepLake-based RAG examples.

Highlights

  • New RAG Example Notebook: Introduced a new notebook example demonstrating a Retrieval Augmented Generation (RAG) pipeline using Google Gemini, LangChain, and ChromaDB, serving as a Chroma-based alternative to existing DeepLake examples.
  • ChromaDB Integration: Integrated ChromaDB as the vector store for efficient document storage and retrieval within the RAG pipeline.
  • Modern LangChain Syntax (LCEL): The example now utilizes the latest LangChain Expression Language (LCEL) for building the retrieval chain logic, aligning with modern LangChain practices.
  • Flexible Gemini Model Selection: Implemented gemini-1.5-flash (and gemini-2.0-flash, gemini-2.5-flash) with a Colab parameter dropdown, allowing for easy selection of the Gemini model.
  • Enhanced API Key Handling: Improved secure API key handling by mapping GEMINI_API_KEY from Colab user secrets to the library requirement, including a try-except block for robust error management.
  • Notebook Refinements: Consolidated package installations, updated import statements to use langchain_core and langchain_community explicitly, and cleared execution outputs for a cleaner notebook state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the RAG example notebook to use ChromaDB and modern LangChain (LCEL) syntax, which is a valuable update. The code is cleaner and uses more current practices.

However, my review identified two critical issues that will prevent the notebook from running correctly: a missing beautifulsoup4 dependency and incorrect API key handling that will lead to authentication failure. I have also provided several medium-severity suggestions to align the notebook with the repository's style guide, covering topics like redundant UI elements, correct use of @param, and removing unused code. Addressing these points will significantly improve the notebook's quality and reliability.

"%pip install --quiet -U langchain-community==0.0.20\n",
"%pip install --quiet chromadb\n",
"%pip install --quiet bs4"
"%pip --quiet install langchain langchain-community chromadb langchain-core langchain-google-genai"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

WebBaseLoader depends on the beautifulsoup4 library for parsing web pages. This dependency was present in the old installation commands but is missing from this new consolidated command, which will cause a runtime ImportError. Please add it back.

%pip --quiet install langchain langchain-community chromadb langchain-core langchain-google-genai beautifulsoup4

Comment on lines +168 to 175
"try:\n",
" # Get the key from Colab Secrets (named 'GEMINI_API_KEY' per instructions)\n",
" gemini_key = userdata.get('GEMINI_API_KEY')\n",
" os.environ[\"GEMINI_API_KEY\"] = gemini_key\n",
" print(\"API Key loaded successfully.\")\n",
"except Exception as e:\n",
" print(\"Error: Please make sure you have created a Secret named 'GEMINI_API_KEY'.\")"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change will cause authentication to fail. The langchain-google-genai library expects the API key to be in the GOOGLE_API_KEY environment variable, not GEMINI_API_KEY. The instructions in the markdown cell just above this one, as well as the Authentication.ipynb quickstart, also state that the secret should be named GOOGLE_API_KEY. Please use GOOGLE_API_KEY for both the secret name and the environment variable.

try:
    # Get the key from Colab Secrets (named 'GOOGLE_API_KEY' per instructions)
    api_key = userdata.get('GOOGLE_API_KEY')
    os.environ["GOOGLE_API_KEY"] = api_key
    print("API Key loaded successfully.")
except Exception as e:
    print("Error: Please make sure you have created a Secret named 'GOOGLE_API_KEY'.")

Comment on lines +3 to +12
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>"
]
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This 'Open in Colab' badge is redundant and misplaced. According to the repository style guide (line 50), the badge should be placed immediately after the H1 title. Another badge already exists in the correct location (at line 59 of the file). Please remove this newly added cell to avoid confusion.

References
  1. The style guide specifies that the 'Open in Colab' badge should be placed immediately after the H1 header for consistency across notebooks. (link)

"from langchain_google_genai import GoogleGenerativeAIEmbeddings\n",
"\n",
"gemini_embeddings = GoogleGenerativeAIEmbeddings(model=\"models/gemini-embedding-001\")"
"gemini_embeddings = GoogleGenerativeAIEmbeddings(model=\"models/gemini-embedding-001\") # @param"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The # @param comment here is not used correctly. It will turn the entire line into an editable text field in Colab, which is not the intended use for a non-configurable value. The style guide (lines 72-77) reserves @param for user-configurable values, typically with a dropdown. Since the embedding model is fixed in this notebook, please remove the # @param.

gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
References
  1. The style guide demonstrates using @param to create interactive form elements like dropdowns for model selection, not for static code lines. (link)

},
"outputs": [],
"source": [
"from ast import Param\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This import of Param from ast is not used anywhere in the notebook. Please remove it to keep the code clean and avoid confusion.

"# \"temperature\" parameter directly.\n",
"\n",
"llm = ChatGoogleGenerativeAI(model=\"gemini-2.5-flash\")"
"model = \"gemini-2.0-flash\" # @param [\"gemini-2.5-flash\", \"gemini-2.0-flash\"]\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few issues with this model selection parameter that violate the repository style guide:

  • Naming: The variable should be named MODEL_ID for consistency with the style guide (line 137 for constants, line 73 for an example).
  • Parameter format: The @param is missing the JSON configuration {"allow-input":true, "isTemplate": true} which is recommended by the style guide (line 73) for easier maintenance.
  • Model Names: The model names gemini-2.0-flash and gemini-2.5-flash appear to be internal or outdated. Please use current, publicly available model names like gemini-1.5-flash-latest to ensure the notebook works for external users.
MODEL_ID = "gemini-1.5-flash-latest"  # @param ["gemini-1.5-flash-latest", "gemini-1.5-pro-latest"] {"allow-input": true, "isTemplate": true}
References
  1. The style guide provides a specific format for model selection parameters, including the variable name MODEL_ID and a JSON configuration to make it a user-editable dropdown. (link)

{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While cleaning up the notebook is good, the repository style guide (line 42) recommends keeping cell outputs so users can see the expected results without running the code. Since the outputs here are not excessively large, please consider re-running the notebook and committing it with the outputs included.

References
  1. The style guide states a preference for including cell outputs in committed notebooks to allow readers to see the results without executing the code themselves, unless the outputs are very large. (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:examples Issues/PR referencing examples folder status:awaiting review PR awaiting review from a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maintenance: Migrate deprecated LangChain imports across multiple notebooks

1 participant