Currently, I am using the nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2 image to create an InferenceService following the provided guide, and the service is running properly. Pod creation and API calls are working fine, but I am encountering an issue when trying to delete the Pod.
It seems that the Terminate command is not being sent to the Nim server when I request the deletion of the InferenceService or Pod. There are no KILL signals in the internal logs either, and the Pod is only forcefully deleted when it reaches the terminationGracePeriodSeconds: 300.
Do I need to provide any additional options when starting the Nim server, or is this a known issue?