|
5 | 5 | "id": "4c60986a", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# Introduction\n", |
| 8 | + "## Introduction\n", |
9 | 9 | "\n", |
10 | | - "In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Hugging Face](https://huggingface.co/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com//tutorial-huggingface-couchbase-vector-search-with-global-secondary-index)" |
| 10 | + "In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Hugging Face](https://huggingface.co/) as the AI-powered embedding model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval.\n", |
| 11 | + "\n", |
| 12 | + "This tutorial uses Couchbase's **Search Vector Index** for vector similarity search. For more information on vector indexes, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html).\n", |
| 13 | + "\n", |
| 14 | + "This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using Hyperscale or Composite Vector Indexes, please take a look at [this tutorial](https://developer.couchbase.com/tutorial-huggingface-couchbase-vector-search-with-hyperscale-or-composite-vector-index)." |
11 | 15 | ] |
12 | 16 | }, |
13 | 17 | { |
14 | 18 | "cell_type": "markdown", |
15 | 19 | "id": "6178e6b3", |
16 | 20 | "metadata": {}, |
17 | 21 | "source": [ |
18 | | - "# How to run this tutorial\n", |
| 22 | + "## How to Run This Tutorial\n", |
19 | 23 | "\n", |
20 | | - "This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/huggingface/fts/hugging_face.ipynb).\n", |
| 24 | + "This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/huggingface/search_based/hugging_face.ipynb).\n", |
21 | 25 | "\n", |
22 | 26 | "You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment." |
23 | 27 | ] |
|
27 | 31 | "id": "ef73d80c", |
28 | 32 | "metadata": {}, |
29 | 33 | "source": [ |
30 | | - "# Before you start\n", |
| 34 | + "## Before You Start\n", |
31 | 35 | "\n", |
32 | | - "## Create and Deploy Your Free Tier Operational cluster on Capella\n", |
| 36 | + "### Create and Deploy Your Free Tier Operational Cluster on Capella\n", |
33 | 37 | "\n", |
34 | 38 | "To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.\n", |
35 | 39 | "\n", |
|
48 | 52 | "id": "77308721", |
49 | 53 | "metadata": {}, |
50 | 54 | "source": [ |
51 | | - "# Install necessary libraries" |
| 55 | + "## Install Necessary Libraries" |
52 | 56 | ] |
53 | 57 | }, |
54 | 58 | { |
55 | 59 | "cell_type": "code", |
56 | | - "execution_count": 1, |
| 60 | + "execution_count": null, |
57 | 61 | "id": "208a54a1", |
58 | 62 | "metadata": {}, |
59 | 63 | "outputs": [], |
|
66 | 70 | "id": "9470f9e3-311b-45c8-81c3-baa5fe0995d2", |
67 | 71 | "metadata": {}, |
68 | 72 | "source": [ |
69 | | - "# Imports" |
| 73 | + "## Imports" |
70 | 74 | ] |
71 | 75 | }, |
72 | 76 | { |
|
98 | 102 | "id": "041a3edf-f5f7-43e1-99b9-b775e94fbfe6", |
99 | 103 | "metadata": {}, |
100 | 104 | "source": [ |
101 | | - "# Prerequisites\n", |
102 | | - "In order to run this tutorial, you will need access to a Couchbase Cluster with Full Text Search service either through Couchbase Capella or by running it locally and have credentials to acces a collection on that cluster:" |
| 105 | + "## Prerequisites\n", |
| 106 | + "\n", |
| 107 | + "In order to run this tutorial, you will need access to a Couchbase Cluster with Search Service enabled either through Couchbase Capella or by running it locally, and have credentials to access a collection on that cluster:" |
103 | 108 | ] |
104 | 109 | }, |
105 | 110 | { |
|
126 | 131 | "id": "15edfec2-64bd-4ba1-b072-4fadacddb01a", |
127 | 132 | "metadata": {}, |
128 | 133 | "source": [ |
129 | | - "# Couchbase Connection\n", |
| 134 | + "## Couchbase Connection\n", |
| 135 | + "\n", |
130 | 136 | "In this section, we first need to create a `PasswordAuthenticator` object that would hold our Couchbase credentials:" |
131 | 137 | ] |
132 | 138 | }, |
|
182 | 188 | "id": "625881d5-39e2-44ed-bbca-0db67e98f765", |
183 | 189 | "metadata": {}, |
184 | 190 | "source": [ |
185 | | - "# Creating Couchbase Vector Search Index\n", |
186 | | - "In order to store generated with Hugging Face embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in a file named `huggingface_index.json` located in the folder with this tutorial. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html). Please note that the index is configured for documents from bucket `hugginface`, scope `_default` and collection `huggingface` and you will have to edit `source` and document type name in the index definition file if your collection, scope or bucket names are different.\n", |
| 191 | + "## Creating Couchbase Search Vector Index\n", |
| 192 | + "\n", |
| 193 | + "In order to store Hugging Face-generated embeddings onto a Couchbase Cluster, a Search Vector Index needs to be created first. We included a sample index definition that will work with this tutorial in a file named `huggingface_index.json` located in the folder with this tutorial.\n", |
| 194 | + "\n", |
| 195 | + "The definition can be used to create a Search Vector Index using Couchbase server web console. For more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html).\n", |
| 196 | + "\n", |
| 197 | + "Please note that the index is configured for documents from bucket `huggingface`, scope `_default` and collection `huggingface`. You will need to edit the `source` and document type name in the index definition file if your collection, scope, or bucket names are different.\n", |
187 | 198 | "\n", |
188 | 199 | "Here, our code verifies the existence of the index and will throw an exception if the index has not been found:" |
189 | 200 | ] |
|
213 | 224 | "id": "d71a7207-54d1-44fd-aa9d-d361b42d2c96", |
214 | 225 | "metadata": {}, |
215 | 226 | "source": [ |
216 | | - "# Hugging Face Initialization" |
| 227 | + "## Hugging Face Initialization" |
217 | 228 | ] |
218 | 229 | }, |
219 | 230 | { |
|
240 | 251 | "id": "c0d8e261-d670-4c40-8037-3d4e3084c360", |
241 | 252 | "metadata": {}, |
242 | 253 | "source": [ |
243 | | - "# Embedding Documents\n", |
244 | | - "After initializing Hugging Face transformers library, it can be used to generate vector embeddings for user input or predefined set of phrases. Here, we're generating 2 embeddings for contained in the array strings:" |
| 254 | + "## Embedding Documents\n", |
| 255 | + "\n", |
| 256 | + "After initializing the Hugging Face transformers library, it can be used to generate vector embeddings for user input or a predefined set of phrases. Here, we're generating embeddings for the strings contained in the array:" |
245 | 257 | ] |
246 | 258 | }, |
247 | 259 | { |
|
266 | 278 | "id": "80814e90-699f-4201-8cd3-7ef8adab9966", |
267 | 279 | "metadata": {}, |
268 | 280 | "source": [ |
269 | | - "# Storing Embeddings in Couchbase\n", |
270 | | - "Generated embeddings are then stored as vector fields inside documents that can contain additional information about the vector, including the original text. The documents are then upserted onto the couchbase cluster:" |
| 281 | + "## Storing Embeddings in Couchbase\n", |
| 282 | + "\n", |
| 283 | + "Generated embeddings are then stored as vector fields inside documents that can contain additional information about the vector, including the original text. The documents are then upserted onto the Couchbase cluster:" |
271 | 284 | ] |
272 | 285 | }, |
273 | 286 | { |
|
291 | 304 | "id": "f11a0d98-bcf5-4fe4-b602-6e8a23edf95e", |
292 | 305 | "metadata": {}, |
293 | 306 | "source": [ |
294 | | - "# Searching For Embeddings\n", |
295 | | - "After the documents are upserted onto the cluster, their vector fields will be added into previously imported vector index. Later, new embeddings can be added or used to perform a similarity search on the previously added documents:" |
| 307 | + "## Searching For Embeddings\n", |
| 308 | + "\n", |
| 309 | + "After the documents are upserted onto the cluster, their vector fields will be added to the previously imported Search Vector Index. Later, new embeddings can be added or used to perform a similarity search on the previously added documents:" |
296 | 310 | ] |
297 | 311 | }, |
298 | 312 | { |
|
0 commit comments