Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

![Azure_Cloud_Shell.png](imgs/AIQonAzureFoundry.png)
## Introduction
This workshop guides you through deploying a complete AI research platform on Azure Kubernetes Engine (AKS). You'll deploy both the NVIDIA [RAG Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) and the [AI-Q Research Assistant](https://build.nvidia.com/nvidia/ai-research-assistant) to create a powerful system for document Q&A and automated research report generation.
This workshop guides you through deploying a complete AI research platform on Azure Kubernetes Engine (AKS). You'll deploy both the NVIDIA [Enterprise RAG Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) and the NVIDIA [Enterprise Research Assistant](https://build.nvidia.com/nvidia/ai-research-assistant) Blueprint (also called AI-Q) to create a powerful system for document Q&A and automated research report generation.

The platform combines document understanding (RAG) with intelligent research capabilities (AI-Q) to enable:
* **Document Q&A**: Chat with your documents using state-of-the-art RAG technology
Expand Down Expand Up @@ -35,8 +35,11 @@ A production-ready Retrieval Augmented Generation pipeline that enables Q&A over
### **NVIDIA AI-Q Research Assistant**
An intelligent research platform that generates comprehensive reports by querying multiple sources, synthesizing findings, and presenting them in editable, human-friendly formats.

### **NIMs (NVIDIA Inference Microservices)**
Optimized containers for deploying AI models with TensorRT acceleration. This workshop uses:
### **NVIDIA NIM Microservices**
[NVIDIA NIM](https://developer.nvidia.com/nim) are a set of easy-to-use
inference microservices for accelerating the deployment of foundation models on any cloud or data center and helping to keep your data secure.

This workshop uses:
- **Nemotron Super 49B**: Advanced reasoning, chain-of-thought, Q&A, and report synthesis (shared by both RAG and AI-Q). Deployed on Azure AI Foundry or use build.nvidia.com API
- **NeMo Retriever Embedding 1B**: High-quality text embeddings
- **NeMo Retriever Reranking 1B**: Result reranking for improved accuracy
Expand All @@ -52,7 +55,7 @@ An open-source observability platform providing distributed tracing and performa
NIM microservices are natively supported on Azure AI Foundry, enabling developers to quickly create a streamlined path for deployment. The microservices are running on Azure’s managed compute, removing the complexity of setting up and maintaining GPU infrastructure while ensuring high availability and scalability, even for highly demanding workloads. This enables teams to move quickly from model selection to production use.

## Prerequisites
- Azure Account with access to 1-A100 GPUs (standard_nc96ads_a100_v4)
- Azure Account with access to 1-A100 GPU (standard_nc96ads_a100_v4)
- Azure CLI configured and authenticated
- kubectl installed
- Helm 3.x installed
Expand Down Expand Up @@ -202,7 +205,15 @@ Execute the below, to download the values.yaml file:
wget -O values.yaml https://tinyurl.com/rag23values
```

Install RAG2.3 Blueprint, with NIMS llama-32-nv-embedqa-1b,llama-32-nv-rerankqa-1b,nemoretriever-page-elements-v2, nemoretriever-table-structure-v1 deployed on our A100 GPU Node. For Nemotron Super 49B we point to build.nvidia.com API :
Install the RAG Blueprint, version 2.3, with these NIM microservices:
- llama-32-nv-embedqa-1b
- llama-32-nv-rerankqa-1b
- nemoretriever-page-elements-v2
- nemoretriever-table-structure-v1

deployed on our A100 GPU Node.

For Nemotron Super 49B we point to the build.nvidia.com API:

```bash
helm upgrade --install rag --create-namespace -n rag \
Expand Down Expand Up @@ -299,6 +310,7 @@ In order to test the RAG capabilities of this application, we need to upload a d

* Click new collection at the bottom left corner and give it a name
* Upload a Document by clicking in the square under "Source Files", selecting a PDF or text file and clicking "Create Collection"
- Here is an [ example document that talks about the NVIDIA Nemotron 3 family of models](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-White-Paper.pdf)

![upload_popup.png](imgs/upload-rag23.png)

Expand Down Expand Up @@ -339,7 +351,7 @@ export NVIDIA_API_KEY="xxx"

```
export NVIDIA_API_URL="https://integrate.api.nvidia.com/v1"
export NVIDIA_API_KEY="nvapi-cxxxxx"
export NVIDIA_API_KEY="<YOUR NGC API KEY>"
```
-------
Set **Tavily API Key** ([Sign up here](https://tavily.com) - Free tier available)
Expand Down Expand Up @@ -395,7 +407,7 @@ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AG
aiq-aira-frontend-lb LoadBalancer 10.0.13.61 50.85.148.12 80:30369/TCP 21m
```

### Before using the RAG app. Verify that all PODs are running:
### Before using the AI-Q app, verify that all PODs are running:

```bash
kubectl get pods -n aira
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ Wait until the model has been fully deployed on the Azure AI Foundry endpoint:
![Screenshot 2025-11-03 at 18.29.33.png](imgs/Screenshot%202025-11-03%20at%2018.29.33.png)


### **Note:** Make sure you've cloned this repository or downloaded the files in the "sample_files" folder before moving on to the next steps

1. Open a terminal on your computer

2. Execute the below command to navigate to the sample files folder
Expand Down Expand Up @@ -239,7 +241,8 @@ and this prompt for the user role:



#Congratulations!
# Congratulations!

You've successfully deployed an NVIDIA Cosmos Reason NIM on Azure AI Foundry! Explore further by implementing robotics and physical AI applications with NIMs, experimenting with different GPU types, and scaling your deployments today. Happy modeling!

NVIDIA offers NIMs with enterprise support through our Azure Marketplace listing, [NVIDIA AI Enterprise](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/nvidia.nvidia-ai-enterprise?tab=Overview).
40 changes: 38 additions & 2 deletions cloud-service-providers/azure/workshops/nim-aifoundry/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,42 @@
To learn more about NVIDIA NIM on Azure AI Foundry:
https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/

## What you will learn

By the end of this workshop, you will have hands-on experience with:

1. Creating an Azure AI Foundry Hub and Project
2. Exploring the NVIDIA Collection of NIM microservices that are intergrated natively in Azure AI Foundry
3. Deploying the NVIDIA Nemotron NIM on an Azure AI Foundry endpoint and how to use it


## Learn the Components

### **NVIDIA Cosmos Reason**

[NVIDIA Nemotron ](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/)
is a family of open models, datasets, and technologies that empower you to build efficient, accurate, and specialized agentic AI systems. Designed for advanced reasoning, coding, visual understanding, agentic tasks, safety, and information retrieval, Nemotron models are openly available and integrated across the AI ecosystem so they can be deployed anywhere—from edge to cloud.

### **NVIDIA NIM microservices**

[NVIDIA NIM](https://developer.nvidia.com/nim) are a set of easy-to-use
inference microservices for accelerating the deployment of foundation models
on any cloud or data center and helping to keep your data secure.

### **Azure AI Foundry**

[Azure AI Foundry](https://ai.azure.com/?cid=learnDocs&tid=43083d15-7273-40c1-b7db-39efd9ccc17a) is a unified Azure platform-as-a-service offering for enterprise AI operations, model builders, and application development. This foundation combines production-grade infrastructure with friendly interfaces, enabling developers to focus on building applications rather than managing infrastructure.


## What you need

To complete this lab, you need:

* Access to a standard internet browser
* Access to an Azure subscription with access to Azure GPU A100




# Task 1: Create an Azure AI Foundry Hub

Expand Down Expand Up @@ -156,11 +192,11 @@ Output should like this:
![Screenshot 2025-10-30 at 17.20.05.png](imgs/curl.png)


> [!note] If you want to test the API with **reasoning OFF**, replace "/think" with "/no_think" in the request
> [note] If you want to test the API with **reasoning OFF**, replace "/think" with "/no_think" in the request



#Congratulations!
# Congratulations!
You've successfully deployed an NVIDIA NIM on Azure AI Foundry! Explore further by implementing RAG patterns with NIMs, experimenting with different GPU types, and scaling your deployments today. Happy modeling!

NVIDIA offers NIMs with enterprise support through our Azure Marketplace listing, [NVIDIA AI Enterprise](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/nvidia.nvidia-ai-enterprise?tab=Overview).
96 changes: 45 additions & 51 deletions cloud-service-providers/azure/workshops/rag-blueprint-aks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ The NVIDIA RAG blueprint serves as a reference solution for a foundational Retri
One of the key use cases in Generative AI is enabling users to ask questions and receive answers based on their enterprise data corpus.
This blueprint demonstrates how to set up a RAG solution that uses NVIDIA NIM and GPU-accelerated components.

# Key Features
![Project Preview](imgs/RAG_diagram.jpg)

## Key Features

- Multimodal PDF data extraction support with text, tables, charts, and infographics
- Support for audio file ingestion
Expand All @@ -26,6 +28,22 @@ This blueprint demonstrates how to set up a RAG solution that uses NVIDIA NIM an
- OpenAI-compatible APIs
- Decomposable and customizable

## What you will learn

By the end of this workshop, you will have hands-on experience with:
1. Deploying a RAG pipeline on AKS: Learn to deploy a complete RAG pipeline, including LLM, embedding, and retriever microservices, onto your AKS cluster using NVIDIA NIM microservices
2. Integrating with Milvus vector database: Understand how to connect your RAG pipeline to a Milvus vector store for efficient storage and retrieval of embeddings.
3. Utilizing the NVIDIA Langchain wrapper: Gain familiarity with the NVIDIA Langchain wrapper for seamless interaction with deployed NIM.
4. Managing and scaling your RAG deployment: Explore techniques for managing, monitoring, and scaling your RAG pipeline using Kubernetes features to ensure optimal performance and resource utilization.

## Learn the Components
### **NVIDIA RAG Blueprint**
A production-ready Retrieval Augmented Generation pipeline that enables Q&A over your documents. Includes document ingestion, embedding, vector search, reranking, and LLM-powered response generation with citations.

### **NVIDIA NIM Microservices**
[NVIDIA NIM](https://developer.nvidia.com/nim) are a set of easy-to-use
inference microservices for accelerating the deployment of foundation models on any cloud or data center and helping to keep your data secure.

# Prerequisites

- NVIDIA Account and API Key (follow [these instructions](https://nvdam.widen.net/s/tvgjgxrspd/create-build-account-and-api-key) to create an account and generate an API Key)
Expand Down Expand Up @@ -77,7 +95,7 @@ az extension update --name aks-preview

### 2. Configure NVIDIA API Key

As part of the RAG blueprint several NVIDIA NIMs will be deployed. In order to get started with NIM, we'll need to make sure we have access to an [NVIDIA API key](https://org.ngc.nvidia.com/setup/api-key). We can export this key to be used as an environment variable:
As part of the RAG blueprint several NVIDIA NIM will be deployed. In order to get started with NIM, we'll need to make sure we have access to an [NVIDIA API key](https://build.nvidia.com/settings/api-keys). We can export this key to be used as an environment variable:

```bash
export NGC_API_KEY="<YOUR NGC API KEY>"
Expand Down Expand Up @@ -158,60 +176,36 @@ We need to wait until all pods are in "Running" status and their "Ready" column

# Task 3: NVIDIA Blueprint Deployment

### 1. Create a Kubernetes namespace
### 1. Install the RAG blueprint Helm chart

```bash
kubectl create namespace $NAMESPACE
```
Ensure that all pods from the previous command, are in "Running" status and their "Ready" column shows all pods ready (e.g. 1/1, 2/2 etc.)

### 2. Install the RAG blueprint Helm chart

Note: in order to save GPU resources, we will be deploying the text-only ingestion blueprint.

```bash
helm install rag -n rag https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-rag-v2.2.0.tgz \
--create-namespace \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
--set nim-llm.enabled=true \
--set nim-llm.image.repository="nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1" \
--set nim-llm.image.tag="latest" \
--set nim-llm.resources.limits."nvidia\.com/gpu"=2 \
--set nim-llm.resources.requests."nvidia\.com/gpu"=2 \
--set nvidia-nim-llama-32-nv-embedqa-1b-v2.enabled=true \
--set nvidia-nim-llama-32-nv-embedqa-1b-v2.image.tag="1.3.0" \
--set nvidia-nim-llama-32-nv-embedqa-1b-v2.resources.limits."nvidia\.com/gpu"=1 \
--set nvidia-nim-llama-32-nv-embedqa-1b-v2.resources.requests."nvidia\.com/gpu"=1 \
--set text-reranking-nim.enabled=false \
--set ingestor-server.enabled=true \
--set ingestor-server.envVars.APP_VECTORSTORE_ENABLEGPUINDEX="False" \
--set ingestor-server.envVars.APP_VECTORSTORE_ENABLEGPUSEARCH="False" \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTTABLES="False" \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTCHARTS="False" \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTIMAGES="False" \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTINFOGRAPHICS="False" \
--set ingestor-server.envVars.APP_NVINGEST_ENABLEPDFSPLITTER="False" \
--set ingestor-server.envVars.APP_NVINGEST_CHUNKSIZE="1024" \
--set ingestor-server.envVars.NV_INGEST_FILES_PER_BATCH="32" \
--set ingestor-server.envVars.NV_INGEST_CONCURRENT_BATCHES="8" \
--set ingestor-server.envVars.ENABLE_MINIO_BULK_UPLOAD="True" \
--set ingestor-server.envVars.NV_INGEST_DEFAULT_TIMEOUT_MS="5000" \
--set ingestor-server.nv-ingest.redis.image.repository="bitnamisecure/redis" \
--set ingestor-server.nv-ingest.redis.image.tag="latest" \
--set ingestor-server.nv-ingest.envVars.INGEST_DISABLE_DYNAMIC_SCALING="True" \
--set ingestor-server.nv-ingest.envVars.MAX_INGEST_PROCESS_WORKERS="32" \
--set ingestor-server.nv-ingest.envVars.NV_INGEST_MAX_UTIL="80" \
--set ingestor-server.nv-ingest.envVars.INGEST_EDGE_BUFFER_SIZE="128" \
--set ingestor-server.nv-ingest.milvus.image.all.repository="milvusdb/milvus" \
--set ingestor-server.nv-ingest.milvus.image.all.tag="v2.5.3" \
--set ingestor-server.nv-ingest.milvus.standalone.resources.limits."nvidia\.com/gpu"=0 \
--set ingestor-server.nv-ingest.nemoretriever-page-elements-v2.deployed=true \
--set ingestor-server.nv-ingest.nemoretriever-graphic-elements-v1.deployed=false \
--set ingestor-server.nv-ingest.nemoretriever-table-structure-v1.deployed=false \
--set ingestor-server.nv-ingest.paddleocr-nim.deployed=false \
--set envVars.ENABLE_RERANKER="False"
helm upgrade --install rag --create-namespace --namespace $NAMESPACE https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-rag-v2.3.0.tgz \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
--set nim-llm.resources.limits."nvidia\.com/gpu"=2 \
--set nim-llm.resources.requests."nvidia\.com/gpu"=2 \
--set nv-ingest.milvus.image.all.repository=docker.io/milvusdb/milvus \
--set nv-ingest.milvus.image.tools.repository=docker.io/milvusdb/milvus-config-tool \
--set nv-ingest.milvus.standalone.resources.limits."nvidia\.com/gpu"=0 \
--set nv-ingest.milvus.standalone.resources.requests."nvidia\.com/gpu"=0 \
--set nv-ingest.milvus.minio.image.repository=docker.io/minio/minio \
--set ingestor-server.envVars.APP_VECTORSTORE_ENABLEGPUINDEX=False \
--set ingestor-server.envVars.APP_VECTORSTORE_ENABLEGPUSEARCH=False \
--set nv-ingest.nemoretriever-graphic-elements-v1.deployed=false \
--set nv-ingest.nemoretriever-table-structure-v1.deployed=false \
--set nv-ingest.paddleocr-nim.deployed=false \
--set nv-ingest.nemoretriever-ocr.deployed=false \
--set nvidia-nim-llama-32-nv-rerankqa-1b-v2.enabled=false \
--set frontend.service.type=LoadBalancer \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTTEXT=True \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTINFOGRAPHICS=False \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTTABLES=False \
--set ingestor-server.envVars.APP_NVINGEST_EXTRACTCHARTS=False
```

### 3. Verify that the PODs are running
Expand All @@ -235,7 +229,6 @@ rag-nvidia-nim-llama-32-nv-embedqa-1b-v2-576fdc44bb-hmx6j 1/1 Running 0
rag-redis-master-0 1/1 Running 0 13m
rag-redis-replicas-0 1/1 Running 4 (115s ago) 13m
rag-server-64dd5c74c9-zclj9 1/1 Running 0 13m
rag-text-reranking-nim-74d96dc99d-sdghx 1/1 Running 0 13m
```


Expand Down Expand Up @@ -289,6 +282,7 @@ In order to test the RAG capabilities of this application, we need to upload a d

* Click new collection at the bottom left corner and give it a name
* Upload a Document by clicking in the square under "Source Files", selecting a PDF or text file and clicking "Create Collection"
- Here is an [ example document that talks about the NVIDIA Nemotron 3 family of models](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-White-Paper.pdf)

![upload_popup.png](imgs/upload_popup.png)
* Wait for "Collection Created successfully" notification
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.