Deploying on Google Cloud run

Managed to deploy to GCP cloud run using your example. Question what kind of payload does the model accept? I can see traffic going and things happening in the cloud run side but no response to this:

curl -X 'POST' \
'https://lxxxx.us-central1.run.app/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
    "model": "nvidia/llama-3-8b-instruct-l4:1.0",
    "messages": [{"role":"user", "content":"Write a limerick about the wonders of GPU computing."}],
    "max_tokens": 64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying on Google Cloud run #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deploying on Google Cloud run #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions