-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Description
Managed to deploy to GCP cloud run using your example. Question what kind of payload does the model accept? I can see traffic going and things happening in the cloud run side but no response to this:
curl -X 'POST'
'https://lxxxx.us-central1.run.app/v1/chat/completions'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"model": "nvidia/llama-3-8b-instruct-l4:1.0",
"messages": [{"role":"user", "content":"Write a limerick about the wonders of GPU computing."}],
"max_tokens": 64
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels