diff --git a/README.md b/README.md index 55488fd5..b5cc346e 100644 --- a/README.md +++ b/README.md @@ -1,41 +1,42 @@ ## Introduction -This repo showcases different ways NVIDIA NIMs can be deployed. This repo contains reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments. +This repo is intended to aggregate and showcase different ways NVIDIA NIMs can be deployed. It contains reference implementations, deployment guides, examples, and architecture guidance that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments. Many of the most common NIM deployment and lifecycle scenarios addressed here may be addressed by capabilities afforded by the [NVIDIA NIM Operator](https://github.com/NVIDIA/k8s-nim-operator) as it progresses. > **Note** > The content in this repository is designed to provide reference architectures and best-practices for production-grade deployments and product integrations; however the code is not validated on all platforms and does not come with any level of enterprise support. While the deployments should perform well, please treat this codebase as experimental and a collaborative sandbox. For long-term production deployments that require enterprise support from NVIDIA, looks to the official releases on [NVIDIA NGC](https://ngc.nvidia.com/) which are based on the code in this repo. # Deployment Options -| Category | Deployment Option | Description | +**Tools & Guides** +| Category | Type | Description | |------------------------------------|-------------------------------------------------------------|-------------| -| **On-premise Deployments** | **Helm** | | -| | | [LLM NIM](https://github.com/NVIDIA/nim-deploy/tree/main/helm/nim-llm) | | -| | | LLM NIM on OpenShift Container Platform (coming soon) | | -| | **Open Source Platforms** | | -| | | [KServe](https://github.com/NVIDIA/nim-deploy/tree/main/kserve) | | -| | **Independent Software Vendors** | | -| | | Run.ai (coming soon) | | -| **Cloud Service Provider Deployments** | **Azure** | | -| | | [AKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) | | -| | | [Azure ML](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) | | -| | | [Azure prompt flow](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/promptflow) | | -| | **Amazon Web Services** | | -| | | [EKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks) | | -| | | [Amazon SageMaker](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/sagemaker) | | -| | **Google Cloud Platform** | | -| | | [GKE Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/gke) | | -| | | [Google Cloud Vertex AI](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/vertexai/python) | | -| | | [Cloud Run](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/cloudrun) | | -| | **NVIDIA DGX Cloud** | | -| | | [NVIDIA Cloud Functions](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/nvidia/nvcf) | | -| **Documents** | **Deployment Guide** | | -| | | [Hugging Face NIM Deployment](https://github.com/NVIDIA/nim-deploy/tree/main/docs/hugging-face-nim-deployment) | | +| Open Source | Helm Chart(s) | [LLM NIM](https://github.com/NVIDIA/nim-deploy/tree/main/helm/nim-llm) | | +| Open Source Platform | Deployment Guide | [KServe](https://github.com/NVIDIA/nim-deploy/tree/main/kserve) | | +| Commercial Platform | Deployment Guide | [Run.ai](https://github.com/NVIDIA/nim-deploy/tree/main/docs/runai) | | +| Commercial Platform | Deployment Guide | [Hugging Face NIM Deployment](https://github.com/NVIDIA/nim-deploy/tree/main/docs/hugging-face-nim-deployment) | | +| | | LLM NIM on OpenShift Container Platform (coming soon) | | +**Managed Cloud Services** + +| Service | Type | Description | +|------------------------------------|-------------------------------------------------------------|-------------| +| | | | +| Microsoft Azure | Deployment Guide | [AKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) | | +| Microsoft Azure | Deployment Guide | [Azure ML](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) | | +| Microsoft Azure | Deployment Guide | [Azure prompt flow](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/promptflow) | | +| | | | +| Amazon Web Services | Deployment Guide | [EKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks) | | +| Amazon Web Services | Deployment Guide | [Amazon SageMaker](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/sagemaker) | | +| | | | +| Google Cloud Platform | Deployment Guide | [GKE Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/gke) | | +| Google Cloud Platform | Deployment Guide | [Google Cloud Vertex AI](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/vertexai/python) | | +| Google Cloud Platform | Deployment Guide | [Cloud Run](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/cloudrun) | | +| | | | +| NVIDIA DGX Cloud | Deployment Guide | [NVIDIA Cloud Functions](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/nvidia/nvcf) | | +| | | | ## Contributions Contributions are welcome. Developers can contribute by opening a [pull request](https://help.github.com/en/articles/about-pull-requests) and agreeing to the terms in [CONTRIBUTING.MD](CONTRIBUTING.MD). - ## Support and Getting Help Please open an issue on the GitHub project for any questions. All feedback is appreciated, issues, requested features, and new deployment scenarios included. diff --git a/docs/runai/README.md b/docs/runai/README.md new file mode 100644 index 00000000..31391860 --- /dev/null +++ b/docs/runai/README.md @@ -0,0 +1,81 @@ +# Deploy NVIDIA NIM microservices on RunAI + +This document describes the procedure for deploying NIM Microservice employing helm on a RunAI cluster. + +## Prerequisites +1. A conformant Kubernetes cluster ([RunAI K8s requirements](https://docs.run.ai/latest/admin/overview-administrator/)) +2. RunAI installed (version \>= 2.18) +3. [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) installed +4. General NIM requirements: [NIM Prerequisites](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#prerequisites) +5. [Helm](https://helm.sh/docs/) installed locally + +## Integration features + +| Feature | Exists | +|------------------------------------|--------------------| +| Deploy through helm CLI | :white_check_mark: | +| Engine capabilities (Scheduling) | :white_check_mark: | +| Visibility (UI + CLI) | :white_check_mark: | +| Submit through RunAI Workload API | | +| Submit through RunAI UI | | + +## Preparation + +The following initial steps are required: + +### RunAI + +1. Create or select an existing project to deploy the NIM within - for example: `team-a` +2. Enforce RunAI Scheduler in the project's namespace: `Kubectl annotate ns runai-team-a runai/enforce-scheduler-name=true` For additional background see the [RunAI Documentation](https://docs.run.ai/v2.18/admin/runai-setup/config/default-scheduler/) + +### NVIDIA NGC + 1. Create API Key: please follow the guidance in the [NVIDIA NIM Getting Started](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#option-2-from-ngc) documentation to generate a properly scoped API key if you haven't already. For illustration purposes the generated key will be indicated as `XXXYYYZZZ` below. + 2. Add NIM Helm repository to deploy NIM charts: `helm repo add nemo-ms "https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants" --username=\$oauthtoken --password=XXXYYYZZZ` + 3. Create docker registry secret to pull NIM images: `kubectl create secret docker-registry -n runai-team-a registry-secret --docker-username=\$oauthtoken --docker-password=XXXYYYZZZ` + 4. Create docker secret to pull models: `kubectl create secret generic ngc-api -n runai-team-a --from-literal=NGC_CLI_API_KEY=XXXYYYZZZ` + +## Deployment + +For any given NIM you desire to deploy, prepare the values.yaml file (changing as needed) +``` +initContainers: + ngcInit: + imageName: nvcr.io/ohlfw0olaadg/ea-participants/nim_llm + imageTag: 24.06 + secretName: ngc-api + env: + STORE_MOUNT_PATH: /model-store + NGC_CLI_ORG: ohlfw0olaadg + NGC_CLI_TEAM: ea-participants + NGC_MODEL_NAME: llama2-13b-chat + NGC_MODEL_VERSION: a100x2_fp16_24.06 + NGC_EXE: ngc + DOWNLOAD_NGC_CLI: "true" + NGC_CLI_VERSION: "3.34.1" + MODEL_NAME: llama2-13b-chat + +image: + repository: nvcr.io/ohlfw0olaadg/ea-participants/nim_llm + tag: 24.06 + +imagePullSecrets: + - name: registry-secret + +model: + numGpus: 2 + name: llama2-13b-chat + openai_port: 9999 +``` + +Run the following command: +``` +helm -n runai-team-a install llama2-13b-chat-nim nemo-ms/nemollm-inference -f values.yaml +``` +> [!Important] +- The namespace we deploy the helm chart is the RunAI Project namespace (runai-team-a) +- For other models consult the [NVIDIA NIM Supported Models](https://docs.nvidia.com/nim/large-language-models/latest/support-matrix.html#supported-models) matrix + +# View the model within the RunAI UI + +![nim on runai screenshot](runai_nim.png) + diff --git a/docs/runai/runai_nim.png b/docs/runai/runai_nim.png new file mode 100644 index 00000000..f409e97b Binary files /dev/null and b/docs/runai/runai_nim.png differ