diff --git a/cloud-service-providers/aws/sagemaker/deployment_notebooks/llama-3-2-nemoretriever-500m-rerank-v2.ipynb b/cloud-service-providers/aws/sagemaker/deployment_notebooks/llama-3-2-nemoretriever-500m-rerank-v2.ipynb
new file mode 100644
index 00000000..e7287a9b
--- /dev/null
+++ b/cloud-service-providers/aws/sagemaker/deployment_notebooks/llama-3-2-nemoretriever-500m-rerank-v2.ipynb
@@ -0,0 +1,701 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "f0488cae-f2b2-4d0f-9e42-7cd4faae07d8",
+ "metadata": {},
+ "source": [
+ "# Deploy NVIDIA NIM on Amazon SageMaker"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f3cf191c-ac98-4d56-a6e5-a43e6de87b13",
+ "metadata": {},
+ "source": [
+ "NVIDIA NIM, a component of NVIDIA AI Enterprise, enhances your applications with the power of state-of-the-art large language models (LLMs), providing unmatched natural language processing and understanding capabilities. Whether you're developing chatbots, content analyzers, or any application that needs to understand and generate human language, NVIDIA NIM has you covered."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1048b66-14f3-4f47-91fd-e653979c7cd5",
+ "metadata": {},
+ "source": [
+ "In this example we show how to deploy the `Llama 3.2 NeMo Retriever Reranking 500M` from AWS Marketplace on Amazon SageMaker. The Llama 3.2 NeMo Retriever Reranking 500M model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was fine-tuned for multilingual, cross-lingual text question-answering retrieval, with support for long documents (up to 8192 tokens). This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.\n",
+ "\n",
+ "The reranking model is a component in a text retrieval system to improve the overall accuracy. A text retrieval system often uses an embedding model (dense) or a lexical search (sparse) index to return relevant text passages given the input. A reranking model can be used to rerank the potential candidates into a final order. The reranking model has the question-passage pairs as an input and therefore, can process cross attention between the words. It’s not feasible to apply a Ranking model on all documents in the knowledge base, therefore, ranking models are often deployed in combination with embedding models.\n",
+ "\n",
+ "This 500m version is pruned from the 1B version - it shares the same architecture overall, but is smaller and faster. Users should expect 90-95% of the accuracy of the 1B version, but with lower latency (as much as 2- 3x improvement) and reduced memory usage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c1599941-1c76-4352-b1c3-eca6f4a65aaa",
+ "metadata": {},
+ "source": [
+ "
\n",
+ "
IMPORTANT: To run NIM on SageMaker you will need to have your NGC API KEY because it's required to access NGC resources. Check out
this LINK to learn how to get NGC API KEY. \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8ee1f3df-a66e-490e-b4dc-7aa7b3a0ed6e",
+ "metadata": {},
+ "source": [
+ "Please check out the [NeMo Retriever NIM docs](https://docs.nvidia.com/nim/index.html#nemo-retriever) for more information."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fa383fca-0ffb-45f9-a6cf-1849d117a386",
+ "metadata": {},
+ "source": [
+ "## Setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3686a50f-24d5-4778-a02d-28efc31373b7",
+ "metadata": {},
+ "source": [
+ "Installs the dependencies and setup roles required to package the model and create SageMaker endpoint. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "7578a7de-7ed3-4105-bec7-e5d3b04cd4bd",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "import boto3, json, sagemaker, time, os\n",
+ "from sagemaker import get_execution_role\n",
+ "from pathlib import Path\n",
+ "\n",
+ "sess = boto3.Session()\n",
+ "sm = sess.client(\"sagemaker\")\n",
+ "sagemaker_session = sagemaker.Session(boto_session=sess)\n",
+ "role = get_execution_role()\n",
+ "client = boto3.client(\"sagemaker-runtime\")\n",
+ "region = sess.region_name\n",
+ "sts_client = sess.client('sts')\n",
+ "account_id = sts_client.get_caller_identity()['Account']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1d7acbd-edad-4e3e-9386-1e587879b2a5",
+ "metadata": {},
+ "source": [
+ "### Define Arguments"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "68f64791-1c45-4b84-9b1a-1e3ebc60d2de",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "# Llama3.2 NV Reranking 1B V2\n",
+ "public_nim_image = \"public.ecr.aws/nvidia/nim:llama-3-2-nemoretriever-500m-rerank-v2-1.6.0\"\n",
+ "nim_model = \"llama-3-2-nemoretriever-500m-rerank-v2\"\n",
+ "sm_model_name = \"llama-3-2-nemoretriever-500m-rerank-v2\"\n",
+ "instance_type = \"ml.g5.12xlarge\"\n",
+ "payload_model = \"nvidia/llama-3.2-nemoretriever-500m-rerank-v2\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6fb05462",
+ "metadata": {},
+ "source": [
+ "### NIM Container"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d851abe8-9ca4-403b-be7f-aef56dfa4b9c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "We first pull the NIM image from public ECR and then push it to private ECR repo within your account for deploying on SageMaker endpoint. Note:\n",
+ " - NIM ECR image is currently available only in `us-east-1` region\n",
+ " - You must have `ecr:CreateRepository` and appropriate push permissions associated with your execution role"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "944110e5-15bc-4a94-a731-7d4ea9344e9e",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "AWS account ID: 492681118881\n",
+ "Public NIM Image: public.ecr.aws/nvidia/nim:llama-3-2-nemoretriever-500m-rerank-v2-1.6.0\n",
+ "llama-3-2-nemoretriever-500m-rerank-v2-1.6.0: Pulling from nvidia/nim\n",
+ "Digest: sha256:c3d0ca76ae96f06b6eb07a438677c0e8956445dd609b9803b46fcabb8959c3a6\n",
+ "Status: Image is up to date for public.ecr.aws/nvidia/nim:llama-3-2-nemoretriever-500m-rerank-v2-1.6.0\n",
+ "public.ecr.aws/nvidia/nim:llama-3-2-nemoretriever-500m-rerank-v2-1.6.0\n",
+ "Resolved account: 492681118881\n",
+ "Resolved region: us-east-1\n",
+ "Login Succeeded\n",
+ "Using default tag: latest\n",
+ "The push refers to repository [492681118881.dkr.ecr.us-east-1.amazonaws.com/llama-3-2-nemoretriever-500m-rerank-v2]\n",
+ "6ee27010cad9: Preparing\n",
+ "8525239f4ea8: Preparing\n",
+ "772f76626436: Preparing\n",
+ "446c02e27c4d: Preparing\n",
+ "e494f39fdc09: Preparing\n",
+ "56758ff4f861: Preparing\n",
+ "dd1c58662dd4: Preparing\n",
+ "5f70bf18a086: Preparing\n",
+ "9598fb9be911: Preparing\n",
+ "6052358f2c6c: Preparing\n",
+ "f441b212cc4d: Preparing\n",
+ "5f70bf18a086: Preparing\n",
+ "e5f25d95affb: Preparing\n",
+ "8220f6c4d434: Preparing\n",
+ "538bd85bf731: Preparing\n",
+ "d9d2f8a46aec: Preparing\n",
+ "c20ee6959aec: Preparing\n",
+ "599b03a98743: Preparing\n",
+ "7d80fd208f60: Preparing\n",
+ "82f50da27ab9: Preparing\n",
+ "13b3cbacf553: Preparing\n",
+ "ed7318f3eb58: Preparing\n",
+ "dd1c58662dd4: Waiting\n",
+ "c8f2126101b8: Preparing\n",
+ "24fc26d9261c: Preparing\n",
+ "13a63c76bc48: Preparing\n",
+ "12f8c0e7b702: Preparing\n",
+ "5f70bf18a086: Waiting\n",
+ "ce8fd3c032d4: Preparing\n",
+ "e0df98ae68c6: Preparing\n",
+ "9598fb9be911: Waiting\n",
+ "3026dc321b9e: Preparing\n",
+ "a46b9fb7d21a: Preparing\n",
+ "e5f25d95affb: Waiting\n",
+ "6052358f2c6c: Waiting\n",
+ "a5d5a67dc250: Preparing\n",
+ "f9e4fa7f058f: Preparing\n",
+ "f441b212cc4d: Waiting\n",
+ "8220f6c4d434: Waiting\n",
+ "82f50da27ab9: Waiting\n",
+ "24fc26d9261c: Waiting\n",
+ "40854793d4e9: Preparing\n",
+ "0d3c37d1c2fb: Preparing\n",
+ "538bd85bf731: Waiting\n",
+ "98d78d8fec15: Preparing\n",
+ "13b3cbacf553: Waiting\n",
+ "13a63c76bc48: Waiting\n",
+ "d2b6e2c3ac87: Preparing\n",
+ "12f8c0e7b702: Waiting\n",
+ "d9d2f8a46aec: Waiting\n",
+ "e912e7e35f2a: Preparing\n",
+ "c20ee6959aec: Waiting\n",
+ "8ee5bafb8f25: Preparing\n",
+ "7d80fd208f60: Waiting\n",
+ "599b03a98743: Waiting\n",
+ "ed7318f3eb58: Waiting\n",
+ "db10bea02ea2: Preparing\n",
+ "ce8fd3c032d4: Waiting\n",
+ "5f70bf18a086: Preparing\n",
+ "926faa4052e6: Preparing\n",
+ "3abdd8a5e7a8: Preparing\n",
+ "3026dc321b9e: Waiting\n",
+ "e0df98ae68c6: Waiting\n",
+ "a5d5a67dc250: Waiting\n",
+ "f9e4fa7f058f: Waiting\n",
+ "56758ff4f861: Waiting\n",
+ "8ee5bafb8f25: Waiting\n",
+ "40854793d4e9: Waiting\n",
+ "0d3c37d1c2fb: Waiting\n",
+ "a46b9fb7d21a: Waiting\n",
+ "e912e7e35f2a: Waiting\n",
+ "3abdd8a5e7a8: Waiting\n",
+ "db10bea02ea2: Waiting\n",
+ "d2b6e2c3ac87: Waiting\n",
+ "926faa4052e6: Waiting\n",
+ "98d78d8fec15: Waiting\n",
+ "8525239f4ea8: Layer already exists\n",
+ "e494f39fdc09: Layer already exists\n",
+ "446c02e27c4d: Layer already exists\n",
+ "772f76626436: Layer already exists\n",
+ "6ee27010cad9: Layer already exists\n",
+ "9598fb9be911: Layer already exists\n",
+ "56758ff4f861: Layer already exists\n",
+ "dd1c58662dd4: Layer already exists\n",
+ "5f70bf18a086: Layer already exists\n",
+ "6052358f2c6c: Layer already exists\n",
+ "f441b212cc4d: Layer already exists\n",
+ "d9d2f8a46aec: Layer already exists\n",
+ "538bd85bf731: Layer already exists\n",
+ "e5f25d95affb: Layer already exists\n",
+ "8220f6c4d434: Layer already exists\n",
+ "c20ee6959aec: Layer already exists\n",
+ "599b03a98743: Layer already exists\n",
+ "13b3cbacf553: Layer already exists\n",
+ "7d80fd208f60: Layer already exists\n",
+ "82f50da27ab9: Layer already exists\n",
+ "ed7318f3eb58: Layer already exists\n",
+ "13a63c76bc48: Layer already exists\n",
+ "12f8c0e7b702: Layer already exists\n",
+ "24fc26d9261c: Layer already exists\n",
+ "c8f2126101b8: Layer already exists\n",
+ "ce8fd3c032d4: Layer already exists\n",
+ "3026dc321b9e: Layer already exists\n",
+ "e0df98ae68c6: Layer already exists\n",
+ "a46b9fb7d21a: Layer already exists\n",
+ "a5d5a67dc250: Layer already exists\n",
+ "f9e4fa7f058f: Layer already exists\n",
+ "98d78d8fec15: Layer already exists\n",
+ "40854793d4e9: Layer already exists\n",
+ "0d3c37d1c2fb: Layer already exists\n",
+ "d2b6e2c3ac87: Layer already exists\n",
+ "e912e7e35f2a: Layer already exists\n",
+ "db10bea02ea2: Layer already exists\n",
+ "8ee5bafb8f25: Layer already exists\n",
+ "926faa4052e6: Layer already exists\n",
+ "3abdd8a5e7a8: Layer already exists\n",
+ "latest: digest: sha256:c3d0ca76ae96f06b6eb07a438677c0e8956445dd609b9803b46fcabb8959c3a6 size: 9114\n",
+ "492681118881.dkr.ecr.us-east-1.amazonaws.com/llama-3-2-nemoretriever-500m-rerank-v2\n",
+ "Errors: WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.\n",
+ "Configure a credential helper to remove this warning. See\n",
+ "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "import subprocess\n",
+ "\n",
+ "# Get AWS account ID\n",
+ "result = subprocess.run(['aws', 'sts', 'get-caller-identity', '--query', 'Account', '--output', 'text'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)\n",
+ "\n",
+ "if result.returncode != 0:\n",
+ " print(f\"Error getting AWS account ID: {result.stderr}\")\n",
+ "else:\n",
+ " account = result.stdout.strip()\n",
+ " print(f\"AWS account ID: {account}\")\n",
+ "\n",
+ "bash_script = f\"\"\"\n",
+ "echo \"Public NIM Image: {public_nim_image}\"\n",
+ "docker pull {public_nim_image}\n",
+ "\n",
+ "\n",
+ "echo \"Resolved account: {account}\"\n",
+ "echo \"Resolved region: {region}\"\n",
+ "\n",
+ "nim_image=\"{account}.dkr.ecr.{region}.amazonaws.com/{nim_model}\"\n",
+ "\n",
+ "# Ensure the repository name adheres to AWS constraints\n",
+ "repository_name=$(echo \"{nim_model}\" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]._/-')\n",
+ "\n",
+ "# If the repository doesn't exist in ECR, create it.\n",
+ "aws ecr describe-repositories --repository-names \"$repository_name\" > /dev/null 2>&1\n",
+ "\n",
+ "if [ $? -ne 0 ]\n",
+ "then\n",
+ " aws ecr create-repository --repository-name \"$repository_name\" > /dev/null\n",
+ "fi\n",
+ "\n",
+ "# Get the login command from ECR and execute it directly\n",
+ "aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin \"{account}.dkr.ecr.{region}.amazonaws.com\"\n",
+ "\n",
+ "docker tag {public_nim_image} $nim_image\n",
+ "docker push $nim_image\n",
+ "echo -n $nim_image\n",
+ "\"\"\"\n",
+ "nim_image=f\"{account}.dkr.ecr.{region}.amazonaws.com/{nim_model}\"\n",
+ "# Run the bash script and capture real-time output\n",
+ "process = subprocess.Popen(bash_script, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n",
+ "\n",
+ "while True:\n",
+ " output = process.stdout.readline()\n",
+ " if output == b'' and process.poll() is not None:\n",
+ " break\n",
+ " if output:\n",
+ " print(output.decode().strip())\n",
+ "\n",
+ "stderr = process.stderr.read().decode()\n",
+ "if stderr:\n",
+ " print(\"Errors:\", stderr)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "18863fd4-0893-4022-9ab0-38e1af1512d4",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "We print the private ECR NIM image in your account that we will be using for SageMaker deployment. \n",
+ "- Should be similar to `\".dkr.ecr..amazonaws.com/:latest\"`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "79bb63ac-2a7a-4beb-b0dd-77a4473e1a67",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "492681118881.dkr.ecr.us-east-1.amazonaws.com/llama-3-2-nemoretriever-500m-rerank-v2\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(nim_image)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2518de4e-dcad-4944-9025-484878edb00b",
+ "metadata": {},
+ "source": [
+ "### Create SageMaker Endpoint"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5f9efc86-0cf2-403b-9502-fd294acb4cb8",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "**Before proceeding further, please set your NGC API Key.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "a0f1f264-ebd8-4c6a-9926-7a21afd89ea6",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "NGC_API_KEY = None #Set your API key here"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "7bb9b93a-3fdf-49a4-8f89-494238008a77",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "assert NGC_API_KEY is not None, \"NGC API KEY is not set. Please set the NGC_API_KEY variable. It's required for running NIM.\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7bb5bce6-3807-43c3-866f-80543cfdedbf",
+ "metadata": {},
+ "source": [
+ "We define sagemaker model from the NIM container making sure to pass in **NGC_API_KEY**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "1b784149-2ec3-4e29-a7cf-3636843dee8b",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Model Arn: arn:aws:sagemaker:us-east-1:492681118881:model/llama-3-2-nemoretriever-500m-rerank-v2\n"
+ ]
+ }
+ ],
+ "source": [
+ "container = {\n",
+ " \"Image\": nim_image,\n",
+ " \"Environment\": {\"NGC_API_KEY\": NGC_API_KEY}\n",
+ "}\n",
+ "create_model_response = sm.create_model(\n",
+ " ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container\n",
+ ")\n",
+ "\n",
+ "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "79e2f0e6-0377-4a13-9cc3-c345adf08c86",
+ "metadata": {},
+ "source": [
+ "Next we create endpoint configuration, here we are deploying the NIM on the specified instance type."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "c0af8b7c-9347-4203-aea5-f44392449f4e",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Endpoint Config Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint-config/llama-3-2-nemoretriever-500m-rerank-v2\n"
+ ]
+ }
+ ],
+ "source": [
+ "endpoint_config_name = sm_model_name\n",
+ "\n",
+ "create_endpoint_config_response = sm.create_endpoint_config(\n",
+ " EndpointConfigName=endpoint_config_name,\n",
+ " ProductionVariants=[\n",
+ " {\n",
+ " \"InstanceType\": instance_type,\n",
+ " \"InitialVariantWeight\": 1,\n",
+ " \"InitialInstanceCount\": 1,\n",
+ " \"ModelName\": sm_model_name,\n",
+ " \"VariantName\": \"AllTraffic\",\n",
+ " \"ContainerStartupHealthCheckTimeoutInSeconds\": 1800,\n",
+ " \"InferenceAmiVersion\": \"al2-ami-sagemaker-inference-gpu-2\"\n",
+ " }\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "print(\"Endpoint Config Arn: \" + create_endpoint_config_response[\"EndpointConfigArn\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e51121a-a662-4078-a0c6-b163cda0a718",
+ "metadata": {},
+ "source": [
+ "Using the above endpoint configuration we create a new sagemaker endpoint and wait for the deployment to finish. The status will change to InService once the deployment is successful."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "75add3d0-100f-4740-b326-6f54af7e9c0d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Endpoint Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/llama-3-2-nemoretriever-500m-rerank-v2\n"
+ ]
+ }
+ ],
+ "source": [
+ "endpoint_name = sm_model_name\n",
+ "\n",
+ "create_endpoint_response = sm.create_endpoint(\n",
+ " EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
+ ")\n",
+ "\n",
+ "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "ec2d4bc4-b77b-4137-930e-7517295a041c",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Status: Creating\n",
+ "Status: Creating\n",
+ "Status: Creating\n",
+ "Status: Creating\n",
+ "Status: Creating\n",
+ "Status: InService\n",
+ "Arn: arn:aws:sagemaker:us-east-1:492681118881:endpoint/llama-3-2-nemoretriever-500m-rerank-v2\n",
+ "Status: InService\n"
+ ]
+ }
+ ],
+ "source": [
+ "resp = sm.describe_endpoint(EndpointName=endpoint_name)\n",
+ "status = resp[\"EndpointStatus\"]\n",
+ "print(\"Status: \" + status)\n",
+ "\n",
+ "while status == \"Creating\":\n",
+ " time.sleep(60)\n",
+ " resp = sm.describe_endpoint(EndpointName=endpoint_name)\n",
+ " status = resp[\"EndpointStatus\"]\n",
+ " print(\"Status: \" + status)\n",
+ "\n",
+ "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
+ "print(\"Status: \" + status)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c2a97e4c-6dd8-4d9a-841c-e443b7c1583f",
+ "metadata": {},
+ "source": [
+ "### Run Inference"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c9146f44-b85c-4125-b842-fceaf5c3cfa8",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "Once we have the model deployed and endpoint's status is `InService` we can use a sample text to do an inference request. For inference request format and supported parameters please see [this link](https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_2-nv-rerankqa-1b-v1-infer). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d0d36583-d6b0-4fdf-a659-c088f913034a",
+ "metadata": {},
+ "source": [
+ "\n",
+ "IMPORTANT: Model name in inference request payload needs to be the name of NIM model. Please DON'T change it below. \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "id": "a57265e9-98bb-4255-ad7d-143e3aeaf9d4",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{\n",
+ " \"rankings\": [\n",
+ " {\n",
+ " \"index\": 0,\n",
+ " \"logit\": -2.171875\n",
+ " },\n",
+ " {\n",
+ " \"index\": 3,\n",
+ " \"logit\": -2.83203125\n",
+ " },\n",
+ " {\n",
+ " \"index\": 1,\n",
+ " \"logit\": -3.091796875\n",
+ " },\n",
+ " {\n",
+ " \"index\": 2,\n",
+ " \"logit\": -3.1328125\n",
+ " }\n",
+ " ],\n",
+ " \"usage\": {\n",
+ " \"prompt_tokens\": 220,\n",
+ " \"total_tokens\": 220\n",
+ " }\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "query = {\"text\": \"which way did the traveler go?\"}\n",
+ "messages = [\n",
+ " {\"text\": \"two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;\"},\n",
+ " {\"text\": \"then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,\"},\n",
+ " {\"text\": \"and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back.\"},\n",
+ " {\"text\": \"i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference.\"}\n",
+ " ]\n",
+ "payload = {\n",
+ " \"model\": payload_model,\n",
+ " \"query\": query,\n",
+ " \"passages\": messages,\n",
+ " \"truncate\": \"END\"\n",
+ "}\n",
+ "\n",
+ "\n",
+ "response = client.invoke_endpoint(\n",
+ " EndpointName=endpoint_name, ContentType=\"application/json\", Body=json.dumps(payload)\n",
+ ")\n",
+ "\n",
+ "output = json.loads(response[\"Body\"].read().decode(\"utf8\"))\n",
+ "print(json.dumps(output, indent=2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a19063f6-b6c0-4de2-a193-e482f26f7406",
+ "metadata": {},
+ "source": [
+ "### Terminate endpoint and clean up artifacts"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e5db083f-4705-4c68-a488-f82da961be4b",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "sm.delete_model(ModelName=sm_model_name)\n",
+ "sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
+ "sm.delete_endpoint(EndpointName=endpoint_name)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "conda_python3",
+ "language": "python",
+ "name": "conda_python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.18"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}