|
9 | 9 | "This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system using:\n", |
10 | 10 | "- The TMDB movie dataset\n", |
11 | 11 | "- Couchbase as the vector store\n", |
| 12 | + "- Couchbase Search index to enable semantic search \n", |
12 | 13 | "- Haystack framework for the RAG pipeline\n", |
13 | | - "- Capella AI for embeddings and text generation\n", |
| 14 | + "- Capella Model Services for embeddings and text generation\n", |
14 | 15 | "\n", |
15 | 16 | "The system allows users to ask questions about movies and get AI-generated answers based on the movie descriptions." |
16 | 17 | ] |
|
49 | 50 | "outputs": [], |
50 | 51 | "source": [ |
51 | 52 | "import logging\n", |
52 | | - "import base64\n", |
53 | 53 | "import pandas as pd\n", |
54 | 54 | "from datasets import load_dataset\n", |
55 | 55 | "from haystack import Pipeline, GeneratedAnswer\n", |
|
98 | 98 | "\n", |
99 | 99 | "### Deploy Models\n", |
100 | 100 | "\n", |
101 | | - "To create the RAG application, use an embedding model for Vector Search and an LLM for generating responses. \n", |
102 | | - " \n", |
103 | | - "Capella Model Service lets you create both models in the same VPC as your database. It offers the Llama 3.1 Instruct model (8 Billion parameters) for LLM and the mistral model for embeddings. \n", |
| 101 | + "In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. \n", |
| 102 | + "\n", |
| 103 | + "Capella Model Service allows you to create both the embedding model and the LLM in the same VPC as your database. There are multiple options for both the Embedding & Large Language Models, along with Value Adds to the models.\n", |
| 104 | + "\n", |
| 105 | + "Create the models using the Capella Model Services interface. While creating the model, it is possible to cache the responses (both standard and semantic cache) and apply guardrails to the LLM responses.\n", |
104 | 106 | "\n", |
105 | | - "Use the Capella AI Services interface to create these models. You can cache responses and set guardrails for LLM outputs.\n", |
| 107 | + "For more details, please refer to the [documentation](https://docs.couchbase.com/ai/build/model-service/model-service.html). These models are compatible with the [Haystack OpenAI integration](https://haystack.deepset.ai/integrations/openai).\n", |
106 | 108 | "\n", |
107 | | - "For more details, see the [documentation](https://preview2.docs-test.couchbase.com/ai/get-started/about-ai-services.html#model). These models work with [Haystack OpenAI integration](https://haystack.deepset.ai/integrations/openai)." |
| 109 | + "After the models are deployed, please create the API keys for them and whitelist the keys on the IP on which the tutorial is being run. For more details, please refer to the documentation on [generating the API keys](https://docs.couchbase.com/ai/api-guide/api-start.html#model-service-keys)." |
108 | 110 | ] |
109 | 111 | }, |
110 | 112 | { |
|
115 | 117 | "\n", |
116 | 118 | "Enter your Couchbase and Capella AI credentials:\n", |
117 | 119 | "\n", |
118 | | - "CAPELLA_AI_ENDPOINT is the Capella AI Services endpoint found in the models section.\n", |
| 120 | + "CAPELLA_MODEL_SERVICES_ENDPOINT is the Capella Model Services Endpoint found in the models section.\n", |
119 | 121 | "\n", |
120 | | - "> Note that the Capella AI Endpoint requires an additional `/v1` from the endpoint shown on the UI if it is not shown on the UI." |
| 122 | + "> Note that the Capella Model Services Endpoint requires an additional `/v1` from the endpoint shown on the UI if it is not shown on the UI." |
121 | 123 | ] |
122 | 124 | }, |
123 | 125 | { |
124 | 126 | "cell_type": "code", |
125 | | - "execution_count": 2, |
| 127 | + "execution_count": null, |
126 | 128 | "metadata": {}, |
127 | 129 | "outputs": [], |
128 | 130 | "source": [ |
|
135 | 137 | "CB_BUCKET = input(\"Couchbase Bucket: \") \n", |
136 | 138 | "CB_SCOPE = input(\"Couchbase Scope: \")\n", |
137 | 139 | "CB_COLLECTION = input(\"Couchbase Collection: \")\n", |
138 | | - "INDEX_NAME = input(\"Vector Search Index: \")\n", |
| 140 | + "INDEX_NAME = \"vector_search\" # need to be matched with the search index name in the search_index.json file\n", |
139 | 141 | "\n", |
140 | 142 | "# Get Capella AI endpoint\n", |
141 | | - "CB_AI_ENDPOINT = input(\"Capella AI Services Endpoint\")\n", |
142 | | - "CB_AI_ENDPOINT_PASSWORD = base64.b64encode(f\"{CB_USERNAME}:{CB_PASSWORD}\".encode(\"utf-8\")).decode(\"utf-8\")" |
| 143 | + "CAPELLA_MODEL_SERVICES_ENDPOINT = input(\"Enter your Capella Model Services Endpoint: \")\n", |
| 144 | + "LLM_MODEL_NAME = input(\"Enter the LLM name\")\n", |
| 145 | + "LLM_API_KEY = getpass.getpass(\"Enter your Couchbase LLM API Key: \")\n", |
| 146 | + "EMBEDDING_MODEL_NAME = input(\"Enter the Embedding Model name:\")\n", |
| 147 | + "EMBEDDING_API_KEY = getpass.getpass(\"Enter your Couchbase Embedding Model API Key: \")" |
143 | 148 | ] |
144 | 149 | }, |
145 | 150 | { |
|
194 | 199 | " print(f\"Collection '{CB_COLLECTION}' created successfully.\")\n", |
195 | 200 | "\n", |
196 | 201 | "# Create search index from search_index.json file at scope level\n", |
197 | | - "with open('fts_index.json', 'r') as search_file:\n", |
| 202 | + "with open('search_index.json', 'r') as search_file:\n", |
198 | 203 | " search_index_definition = SearchIndex.from_json(json.load(search_file))\n", |
199 | 204 | " \n", |
200 | 205 | " # Update search index definition with user inputs\n", |
|
216 | 221 | " existing_index = scope_search_manager.get_index(search_index_name)\n", |
217 | 222 | " print(f\"Search index '{search_index_name}' already exists at scope level.\")\n", |
218 | 223 | " except Exception as e:\n", |
219 | | - " print(f\"Search index '{search_index_name}' does not exist at scope level. Creating search index from fts_index.json...\")\n", |
220 | | - " with open('fts_index.json', 'r') as search_file:\n", |
| 224 | + " print(f\"Search index '{search_index_name}' does not exist at scope level. Creating search index from search_index.json...\")\n", |
| 225 | + " with open('search_index.json', 'r') as search_file:\n", |
221 | 226 | " search_index_definition = SearchIndex.from_json(json.load(search_file))\n", |
222 | 227 | " scope_search_manager.upsert_index(search_index_definition)\n", |
223 | 228 | " print(f\"Search index '{search_index_name}' created successfully at scope level.\")" |
|
320 | 325 | }, |
321 | 326 | { |
322 | 327 | "cell_type": "code", |
323 | | - "execution_count": 6, |
| 328 | + "execution_count": null, |
324 | 329 | "metadata": {}, |
325 | 330 | "outputs": [], |
326 | 331 | "source": [ |
327 | 332 | "embedder = OpenAIDocumentEmbedder(\n", |
328 | | - " api_base_url=CB_AI_ENDPOINT,\n", |
329 | | - " api_key=Secret.from_token(CB_AI_ENDPOINT_PASSWORD),\n", |
330 | | - " model=\"intfloat/e5-mistral-7b-instruct\",\n", |
| 333 | + " api_base_url=CAPELLA_MODEL_SERVICES_ENDPOINT,\n", |
| 334 | + " api_key=Secret.from_token(EMBEDDING_API_KEY),\n", |
| 335 | + " model=EMBEDDING_MODEL_NAME,\n", |
331 | 336 | ")\n", |
332 | 337 | "\n", |
333 | 338 | "rag_embedder = OpenAITextEmbedder(\n", |
334 | | - " api_base_url=CB_AI_ENDPOINT,\n", |
335 | | - " api_key=Secret.from_token(CB_AI_ENDPOINT_PASSWORD),\n", |
336 | | - " model=\"intfloat/e5-mistral-7b-instruct\",\n", |
| 339 | + " api_base_url=CAPELLA_MODEL_SERVICES_ENDPOINT,\n", |
| 340 | + " api_key=Secret.from_token(EMBEDDING_API_KEY),\n", |
| 341 | + " model=EMBEDDING_MODEL_NAME,\n", |
337 | 342 | ")\n" |
338 | 343 | ] |
339 | 344 | }, |
|
342 | 347 | "metadata": {}, |
343 | 348 | "source": [ |
344 | 349 | "# Initialize LLM Generator\n", |
345 | | - "Configure the LLM generator using Capella AI's endpoint and Llama 3.1 model. This component will generate natural language responses based on the retrieved documents.\n" |
| 350 | + "Configure the LLM generator using Capella Model Services endpoint and LLM model name. This component will generate natural language responses based on the retrieved documents.\n" |
346 | 351 | ] |
347 | 352 | }, |
348 | 353 | { |
349 | 354 | "cell_type": "code", |
350 | | - "execution_count": 7, |
| 355 | + "execution_count": null, |
351 | 356 | "metadata": {}, |
352 | 357 | "outputs": [], |
353 | 358 | "source": [ |
354 | 359 | "llm = OpenAIGenerator(\n", |
355 | | - " api_base_url=CB_AI_ENDPOINT,\n", |
356 | | - " api_key=Secret.from_token(CB_AI_ENDPOINT_PASSWORD),\n", |
357 | | - " model=\"meta-llama/Llama-3.1-8B-Instruct\",\n", |
| 360 | + " api_base_url=CAPELLA_MODEL_SERVICES_ENDPOINT,\n", |
| 361 | + " api_key=Secret.from_token(LLM_API_KEY),\n", |
| 362 | + " model=LLM_MODEL_NAME,\n", |
358 | 363 | ")" |
359 | 364 | ] |
360 | 365 | }, |
|
509 | 514 | "cell_type": "markdown", |
510 | 515 | "metadata": {}, |
511 | 516 | "source": [ |
512 | | - "## Caching in Capella AI Services\n", |
| 517 | + "## Caching in Capella Model Services\n", |
513 | 518 | "\n", |
514 | | - "To optimize performance and reduce costs, Capella AI services employ two caching mechanisms:\n", |
| 519 | + "To optimize performance and reduce costs, Capella Model Services employ two caching mechanisms:\n", |
515 | 520 | "\n", |
516 | 521 | "1. Semantic Cache\n", |
517 | 522 | "\n", |
518 | | - " Capella AI’s semantic caching system stores both query embeddings and their corresponding LLM responses. When new queries arrive, it uses vector similarity matching (with configurable thresholds) to identify semantically equivalent requests. This prevents redundant processing by:\n", |
| 523 | + " Capella Model Services’ semantic caching system stores both query embeddings and their corresponding LLM responses. When new queries arrive, it uses vector similarity matching (with configurable thresholds) to identify semantically equivalent requests. This prevents redundant processing by:\n", |
519 | 524 | " - Avoiding duplicate embedding generation API calls for similar queries\n", |
520 | 525 | " - Skipping repeated LLM processing for equivalent queries\n", |
521 | 526 | " - Maintaining cached results with automatic freshness checks\n", |
|
569 | 574 | "cell_type": "markdown", |
570 | 575 | "metadata": {}, |
571 | 576 | "source": [ |
572 | | - "## LLM Guardrails in Capella AI Services\n", |
573 | | - "\n", |
574 | | - "Capella AI services also provide input and response moderation using configurable LLM guardrails. These services can integrate with the LlamaGuard3-8B model from Meta.\n", |
575 | | - "- Categories to be blocked can be configured during the model creation process.\n", |
576 | | - "- Helps prevent unsafe or undesirable interactions with the LLM.\n", |
577 | | - "\n", |
578 | | - "By implementing caching and moderation mechanisms, Capella AI services ensure an efficient, cost-effective, and responsible approach to AI-powered recommendations." |
| 577 | + "# LLM Guardrails in Capella Model Services\n", |
| 578 | + "Capella Model services also have the ability to moderate the user inputs and the responses generated by the LLM. Capella Model Services can be configured to use the [Llama 3.1 NemoGuard 8B safety model](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety/modelcard) guardrails model from Meta. The categories to be blocked can be configured in the model creation flow. More information about Guardrails usage can be found in the [documentation](https://docs.couchbase.com/ai/build/model-service/configure-guardrails-security.html#guardrails).\n", |
| 579 | + " \n", |
| 580 | + "Here is an example of the Guardrails in action" |
579 | 581 | ] |
580 | 582 | }, |
581 | 583 | { |
|
0 commit comments