Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 22 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 22 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 8
🔄 Models updated (merged) 27

➕ New Models

  • llama3
  • llama3_1
  • llama3-2
  • llama3-3
  • llama4
  • llama2
  • llama-2-quantized
  • codellama-7b-hf

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-2.5-computer-use-preview-10-2025
  • gemini-3-pro-image-preview
  • gemini-3-flash-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite-preview
  • gemini-embedding-2-preview
  • multimodalembedding
  • veo-2.0-generate-001
  • veo-3.0-generate-001
  • veo-3.0-fast-generate-001
  • veo-3.0-generate-preview
  • veo-3.0-fast-generate-preview
  • veo-3.1-generate-001
  • veo-3.1-fast-generate-001
  • veo-3.1-generate-preview
  • veo-3.1-fast-generate-preview
  • claude-opus-4-6
  • claude-sonnet-4-6
  • claude-haiku-4-5@20251001
  • deepseek-r1-0528-maas
  • deepseek-v3.1-maas
  • deepseek-v3.2-maas
  • qwen3-coder-480b-a35b-instruct-maas
  • qwen3-next-80b-a3b-instruct-maas
  • qwen3-next-80b-a3b-thinking-maas

Model-to-Pricing-Page Mapping

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 API Standard: $1.25/$10, cache_read $0.13, batch $0.625/$5, web_search $35/1k, enterprise $45/1k
gemini-2.5-flash Google – Gemini 2.5 API Standard: $0.30/$2.50, cache_read $0.03, batch $0.15/$1.25, web_search $35/1k, enterprise $45/1k
gemini-2.5-flash-lite Google – Gemini 2.5 API Standard: $0.10/$0.40, cache_read $0.01, batch $0.05/$0.20, web_search $35/1k, enterprise $45/1k
gemini-2.5-flash-image Google – Gemini 2.5 API $0.30/$2.50, image_token $30/1M, batch $0.15/$1.25, web_search $35/1k
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Preview alias for gemini-2.5-flash; same pricing
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Preview alias for gemini-2.5-flash-lite; same pricing
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API Same pricing as gemini-2.5-pro
gemini-2.0-flash-001 Google – Gemini 2.0 API $0.15/$0.60, batch $0.075/$0.30, no cache, web_search $35/1k, enterprise $45/1k
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API $0.075/$0.30, batch $0.0375/$0.15, no cache, web_search $35/1k, enterprise $45/1k
gemini-3-pro-preview Google – Gemini 3 API $2/$12, cache_read $0.20, batch $1/$6, image_token $120/1M, web_search $14/1k, enterprise $14/1k
gemini-3-pro-image-preview Google – Gemini 3 API Same as gemini-3-pro-preview pricing
gemini-3-flash-preview Google – Gemini 3 API $0.50/$3, cache_read $0.05, batch $0.25/$1.50, web_search $14/1k, enterprise $14/1k
gemini-3.1-pro-preview Google – Gemini 3 API $2/$12, cache_read $0.20, batch $1/$6, image_token $60/1M, web_search $14/1k, enterprise $14/1k
gemini-3.1-flash-image-preview Google – Gemini 3 API $0.50/$3, batch $0.25/$1.50, image_token $30/1M, web_search $14/1k, enterprise $14/1k
gemini-3.1-flash-lite-preview Google – Gemini 3 API $0.25/$1.50, cache_read $0.03, batch $0.13/$0.75, web_search $14/1k, enterprise $14/1k
gemini-embedding-001 Google – Embedding API $0.00015/1K tokens
gemini-embedding-2-preview Google – Embedding API $0.00015/1K tokens (same as gemini-embedding-001)
text-embedding-005 Google – Embedding API $0.000025/1K tokens
text-multilingual-embedding-002 Google – Embedding API $0.000025/1K tokens
text-embedding-large-exp-03-07 Google – Embedding API $0.000025/1K tokens (same as text-embedding-005)
multimodalembedding Google – Embedding API $0.0002/1K tokens, per-image $0.0001, per-video+ $0.0005
textembedding-gecko Google – Embedding API – price not found Legacy model, no pricing row found; price 0
imagen-4.0-generate-001 Google – Imagen 4 API $0.04/image; matched via lookup_variant imagen-4.0-generate
imagen-4.0-fast-generate-001 Google – Imagen 4 API $0.02/image; matched via lookup_variant imagen-4.0-fast-generate
imagen-4.0-ultra-generate-001 Google – Imagen 4 API $0.06/image; matched via lookup_variant imagen-4.0-ultra-generate
imagen-3.0-generate-002 Google – Imagen 3 API $0.04/image; matched via lookup_variant imagen-3.0-generate
imagen-3.0-capability-001 Google – Imagen 3 API Capability model; uses equivalent generate pricing $0.04/image
imagen-3.0-capability-002 Google – Imagen 3 API Capability model; uses equivalent generate pricing $0.04/image
veo-2.0-generate-001 Google – Veo 2.0 API 50 cents/sec, duration 1, count 1
veo-3.0-generate-001 Google – Veo 3.0 API 40 cents/sec, duration 1, count 1
veo-3.0-fast-generate-001 Google – Veo 3.0 Fast API 15 cents/sec, duration 1, count 1
veo-3.0-generate-preview Google – Veo 3.0 API Preview alias; same pricing as veo-3.0-generate-001
veo-3.0-fast-generate-preview Google – Veo 3.0 Fast API Preview alias; same pricing as veo-3.0-fast-generate-001
veo-3.1-generate-001 Google – Veo 3.1 API 40 cents/sec, duration 1, count 1
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API 35 cents/sec, duration 1, count 1
veo-3.1-generate-preview Google – Veo 3.1 API Preview alias; same pricing as veo-3.1-generate-001
veo-3.1-fast-generate-preview Google – Veo 3.1 Fast API Preview alias; same pricing as veo-3.1-fast-generate-001
claude-opus-4-6 Anthropic – Claude API $5.50/$27.50, cache_write $6.875, cache_read $0.55, batch $2.75/$13.75; @default stripped
claude-sonnet-4-6 Anthropic – Claude API $3.30/$16.50, cache_write $4.13, cache_read $0.33, batch $1.65/$8.25; @default stripped
claude-sonnet-4-5@20250929 Anthropic – Claude API $3/$15, cache_write $3.75, cache_read $0.30, batch $1.50/$7.50
claude-sonnet-4@20250514 Anthropic – Claude API $3/$15, cache_write $3.75, cache_read $0.30, batch $1.50/$7.50
claude-opus-4-5@20251101 Anthropic – Claude API $5/$25, cache_write $6.25, cache_read $0.50, batch $2.50/$12.50
claude-opus-4-1@20250805 Anthropic – Claude API $15/$75, cache_write $18.75, cache_read $1.50, batch $7.50/$37.50
claude-opus-4@20250514 Anthropic – Claude API $15/$75, cache_write $18.75, cache_read $1.50, batch $7.50/$37.50
claude-haiku-4-5@20251001 Anthropic – Claude API $1.10/$5.50, cache_write $1.375, cache_read $0.11, batch $0.55/$2.75
llama-3.3-70b-instruct-maas Meta – Llama API $0.72/$0.72, batch $0.36/$0.36
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 API $0.35/$1.15, batch $0.175/$0.575
llama3 Meta – Llama API – price not found Self-deploy only (has_deploy: true, no -maas); price 0
llama3_1 Meta – Llama API – price not found Self-deploy only; price 0
llama3-2 Meta – Llama API – price not found Self-deploy only; price 0
llama3-3 Meta – Llama API – price not found Self-deploy only; price 0
llama4 Meta – Llama API – price not found Self-deploy only; price 0
llama2 Meta – Llama API – price not found Self-deploy only; price 0
llama-2-quantized Meta – Llama API – price not found Self-deploy only; price 0
codellama-7b-hf Meta – Llama API – price not found Self-deploy only; price 0
mistral-small-2503 Mistral – Mistral Small API $0.10/$0.30
mistral-medium-3 Mistral – Mistral Medium API $0.40/$2.00
codestral-2 Mistral – Codestral API $0.30/$0.90
gpt-oss-120b-maas OpenAI – GPT OSS API $0.09/$0.36, batch $0.045/$0.18
deepseek-r1-0528-maas DeepSeek – R1 API $1.35/$5.40, cache_write $0.675, cache_read $0.06, batch $0.675/$2.70
deepseek-v3.1-maas DeepSeek – V3.1 API $0.60/$1.70, cache_write $0.30, cache_read $0.06, batch $0.30/$0.85
deepseek-v3.2-maas DeepSeek – V3.2 API $0.56/$1.68, cache_write $0.28, cache_read $0.056, batch $0.28/$0.84
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3-235B API $0.22/$0.88, batch $0.11/$0.44
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3-Coder-480B API $0.22/$1.80, cache_write $0.11, cache_read $0.022, batch $0.11/$0.90
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3-Next-80B API $0.15/$1.20, batch $0.11/$0.90
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3-Next-80B Thinking API $0.15/$1.20, batch $0.11/$0.90
kimi-k2-thinking-maas MoonshotAI – Kimi K2 API $0.60/$2.50
minimax-m2-maas MiniMax – M2 API $0.30/$1.20
glm-4.7-maas ZAI.org – GLM-4.7 API $0.60/$2.20
glm-5-maas ZAI.org – GLM-5 API $1.00/$3.20

Excluded models (not added to output)

Model ID Publisher Reason
gemini-live-2.5-flash-native-audio Google *-live-* streaming — excluded by global rule
lyria-002, lyria-3-pro-preview, lyria-3-clip-preview Google lyria-* music generation — excluded by global rule
imagegeneration Google Legacy model excluded by google.md rule
virtual-try-on-001 Google Product-specific retail model excluded by google.md
pretrained-ocr Google OCR model — excluded by global rule
shieldgemma2 Google Safety/guard model — excluded
gemma, gemma2, gemma3, gemma3n, functiongemma, paligemma, codegemma, translategemma, t5gemma, embeddinggemma Google Self-deploy Gemma variants — excluded
chirp-2, chirp-3 Google Audio transcription/speech — excluded
translate-llm Google Non-generative translation — excluded
video-text-detection, video-speech-transcription Google Non-generative CV/NLP — excluded
text-translation Google Non-generative NLP — excluded
image-segmentation-001 Google Non-generative CV — excluded
weathernext, weather-next-v2 Google Non-generative forecasting — excluded
timesfm Google Time-series forecasting (self-deploy) — excluded
bert-base, bert-base-uncased, t5-flan, t5-1.1 Google Non-generative NLP self-deploy — excluded
imageclassification-*, imageobjectdetection-*, imagesegmentation-* Google Non-generative CV — excluded
earth-ai-imagery-* Google Non-generative vision — excluded
path-foundation, derm-foundation, txgemma, hear, medgemma, medsiglip, medasr, cxr-foundation Google Medical/specialized self-deploy — non-generative for pricing purposes
mammut, jax-owl-vit-v2, dito, cloudnerf-pytorch-zipnerf, owlvit-base-patch32, vit-jax, pic2word, f-vlm-jax, keras-yolov8, tab-net, tfvision-*, automl-*, resnet50, bart-large-cnn Google Non-generative ML / self-deploy only — excluded
label-detector-pali-001, content-moderation, imagewatermarkdetector, imagetext, language-v1-* Google Non-generative or legacy — excluded
pt-test, occupancy-analytics, vehicle-detector, object-detector, ppe-detector, people-blur, product-recognizer, tag-recognizer Google Non-generative specialized models — excluded
face-detector, pretrained-form-parser, text-detector, imagebind Google/Meta Non-generative — excluded
llama-guard, prompt-guard Meta Guard/safety models — excluded by global rule
faster-r-cnn, retinanet, mask-r-cnn, segment-anything, sam3 Meta Non-generative CV — excluded
xlm-roberta-large, roberta-large, nllb Meta Non-generative NLP — excluded
mistral, mixtral Mistral (mistral-ai) Self-deploy only (has_deploy: true, no -maas) — excluded
codestral-2501-self-deploy Mistral Self-deploy model (name contains self-deploy) — excluded
ministral-3, mistral-large-3 Mistral Self-deploy (has_deploy: true, no -maas) — excluded
mistral-ocr-2505 Mistral OCR model — excluded by global rule
clip-vit-base-patch32, openclip OpenAI Non-generative embedding — excluded
whisper-large OpenAI Audio transcription — excluded
gpt-oss OpenAI Self-deploy (has_deploy: true, no -maas) — excluded
deepseek-r1, deepseek-v3, deepseek-v3-1, deepseek-v3-2 DeepSeek Self-deploy (has_deploy: true, no -maas) — excluded
deepseek-ocr, deepseek-ocr-2, deepseek-ocr-maas DeepSeek OCR models — excluded by global rule
qwq, qwen3, qwen3-5, qwen2, qwen3-coder, qwen3-coder-next, qwen3-next, qwen3-vl Qwen Self-deploy (has_deploy: true, no -maas) — excluded
qwen3-embedding Qwen Self-deploy embedding — excluded
qwen-image Qwen Excluded by explicit policy (qwen-image)
jamba-large-1.6 AI21 Self-deploy only (has_deploy: true, no -maas) — excluded
kimi-k2-5, kimi-k2 MoonshotAI Self-deploy (has_deploy: true, no -maas) — excluded
minimax-m2 MiniMax Self-deploy (has_deploy: true, no -maas) — excluded
glm-4.7, glm-5, glm-4.5 ZAI.org Self-deploy (has_deploy: true, no -maas) — excluded
glm-ocr ZAI.org OCR model — excluded by global rule
glm-image ZAI.org Excluded by explicit policy (glm-image)

Generated by Pricing Agent on 2026-03-26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant