Skip to content

deploying nim on k8s. With this custom-value.yaml, 3.1 8b model can be deployed but 70b failed. #92

@SarielMa

Description

@SarielMa

we follow the steps here: https://docs.nvidia.com/nim/large-language-models/latest/deploy-helm.html

after helm install ....

kubectl logs my-nim-01 --previous

...
{"level": "None", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "[09-25 19:03:45.989 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "exc_info": "None", "stack_info": "None"}
{"level": "ERROR", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "", "exc_info": "Traceback (most recent call last):\n File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File "/usr/lib/python3.10/runpy.py", line 86, in _run_code\n exec(code, run_globals)\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 99, in \n main()\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 42, in main\n inference_env = prepare_environment()\n File "/opt/nim/llm/vllm_nvext/entrypoints/args.py", line 155, in prepare_environment\n engine_args, extracted_name = inject_ngc_hub(engine_args)\n File "/opt/nim/llm/vllm_nvext/hub/ngc_injector.py", line 247, in inject_ngc_hub\n cached = repo.get_all()\nException: error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "stack_info": "None"}

kubectl describe pod my-nim-01
...
Events:
Type Reason Age From Message


Warning BackOff 5m3s (x90 over 102m) kubelet Back-off restarting failed container nim-llm in pod my-nim-0_default(ce8f1e3a-f0e6-4a95-9086-2901091b7a57)
Normal Pulled 4m52s (x15 over 116m) kubelet Container image "nvcr.io/nim/meta/llama-3.1-70b-instruct:latest" already present on machine

kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nim-0 0/1 Running 14 (6m46s ago) 117m

vim custom-value.yaml
image:
repository: "nvcr.io/nim/meta/llama-3.1-70b-instruct" # container location
tag: latest # NIM version you want to deploy
model:
ngcAPISecret: ngc-api # name of a secret in the cluster that includes a key named NGC_API_KEY and is an NGC API key
imagePullSecrets:

  • name: ngc-secret # name of a secret used to pull nvcr.io images, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
    persistence:
    enabled: true
    size: 800Gi
    accessMode: ReadWriteMany
    storageClass: ""
    annotations:
    helm.sh/resource-policy: "keep"
    livenessProbe:
    initialDelaySeconds: 600
    periodSeconds: 60
    timeoutSeconds: 10
    startupProbe:
    initialDelaySeconds: 600
    periodSeconds: 60
    timeoutSeconds: 10
    failureThreshold: 1500
    resources:
    limits:
    nvidia.com/gpu: 4 # much more GPU memory is required

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions