-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: ai rate limiting standard headers #13049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
7a0b879
03608d9
d88f360
526eee2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,10 +131,12 @@ local function read_response(conf, ctx, res, response_filter) | |
| core.log.info("got token usage from ai service: ", | ||
| core.json.delay_encode(data.usage)) | ||
| ctx.llm_raw_usage = data.usage | ||
| local pt = data.usage.prompt_tokens or data.usage.input_tokens or 0 | ||
| local ct = data.usage.completion_tokens or data.usage.output_tokens or 0 | ||
| ctx.ai_token_usage = { | ||
| prompt_tokens = data.usage.prompt_tokens or 0, | ||
| completion_tokens = data.usage.completion_tokens or 0, | ||
| total_tokens = data.usage.total_tokens or 0, | ||
| prompt_tokens = pt, | ||
| completion_tokens = ct, | ||
| total_tokens = data.usage.total_tokens or (pt + ct), | ||
|
Comment on lines
+134
to
+139
|
||
| } | ||
| ctx.var.llm_prompt_tokens = ctx.ai_token_usage.prompt_tokens | ||
| ctx.var.llm_completion_tokens = ctx.ai_token_usage.completion_tokens | ||
|
|
@@ -188,9 +190,13 @@ local function read_response(conf, ctx, res, response_filter) | |
| ctx.ai_token_usage = {} | ||
| if type(res_body.usage) == "table" then | ||
| ctx.llm_raw_usage = res_body.usage | ||
| ctx.ai_token_usage.prompt_tokens = res_body.usage.prompt_tokens or 0 | ||
| ctx.ai_token_usage.completion_tokens = res_body.usage.completion_tokens or 0 | ||
| ctx.ai_token_usage.total_tokens = res_body.usage.total_tokens or 0 | ||
| ctx.ai_token_usage.prompt_tokens = res_body.usage.prompt_tokens | ||
| or res_body.usage.input_tokens or 0 | ||
| ctx.ai_token_usage.completion_tokens = res_body.usage.completion_tokens | ||
| or res_body.usage.output_tokens or 0 | ||
| ctx.ai_token_usage.total_tokens = res_body.usage.total_tokens | ||
| or (ctx.ai_token_usage.prompt_tokens | ||
| + ctx.ai_token_usage.completion_tokens) | ||
| end | ||
| ctx.var.llm_prompt_tokens = ctx.ai_token_usage.prompt_tokens or 0 | ||
| ctx.var.llm_completion_tokens = ctx.ai_token_usage.completion_tokens or 0 | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -65,6 +65,10 @@ local schema = { | |||||||||||||||||||
| default = "total_tokens", | ||||||||||||||||||||
| description = "The strategy to limit the tokens" | ||||||||||||||||||||
| }, | ||||||||||||||||||||
| -- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别 | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use English comments throughout. |
||||||||||||||||||||
| -- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens | ||||||||||||||||||||
| -- false: X-AI-RateLimit-Limit-{instance} (原有行为) | ||||||||||||||||||||
| standard_headers = {type = "boolean", default = false}, | ||||||||||||||||||||
|
Comment on lines
+68
to
+71
|
||||||||||||||||||||
| -- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别 | |
| -- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens | |
| -- false: X-AI-RateLimit-Limit-{instance} (原有行为) | |
| standard_headers = {type = "boolean", default = false}, | |
| standard_headers = { | |
| type = "boolean", | |
| default = false, | |
| description = "Use OpenRouter/OpenAI-compatible standard rate limit header names (true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens; false: keep original behavior with X-AI-RateLimit-Limit-{instance})" | |
| }, |
Copilot
AI
Mar 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added standard_headers option does not take effect when conf.rules is configured. transform_limit_conf() returns early for rules, and limit-count always sets rule.header_prefix (index if unset), which forces X-{prefix}-RateLimit-* header names and ignores conf.limit_header/remaining_header/reset_header. Consider either (1) rejecting standard_headers=true when rules is set (schema/custom validation) and documenting the limitation, or (2) reworking the rules path so standard header names can be emitted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| # ai-rate-limiting — `standard_headers` Parameter | ||
|
|
||
| ## Overview | ||
|
|
||
| The `standard_headers` option makes `ai-rate-limiting` emit rate-limit response | ||
| headers that follow the [OpenRouter / OpenAI convention][openrouter-headers], | ||
| so IDE extensions such as **Cursor** and **Continue** can detect quota exhaustion | ||
| and apply automatic back-off without any custom configuration. | ||
|
|
||
| [openrouter-headers]: https://openrouter.ai/docs/api-reference/limits | ||
|
|
||
| ## New Parameter | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |---|---|---|---| | ||
| | `standard_headers` | boolean | `false` | When `true`, emit OpenAI/OpenRouter-compatible rate-limit headers instead of the legacy `X-AI-RateLimit-*` headers. | | ||
|
|
||
|
Comment on lines
+1
to
+17
|
||
| The header suffix is derived from `limit_strategy`: | ||
|
|
||
| | `limit_strategy` | Header suffix | | ||
| |---|---| | ||
| | `total_tokens` (default) | `Tokens` | | ||
| | `prompt_tokens` | `PromptTokens` | | ||
| | `completion_tokens` | `CompletionTokens` | | ||
|
|
||
| ## Configuration Example | ||
|
|
||
| ```yaml | ||
| routes: | ||
| - id: 1 | ||
| uri: /v1/chat/completions | ||
| plugins: | ||
| ai-proxy-multi: | ||
| instances: | ||
| - name: my-llm | ||
| provider: openai | ||
| weight: 1 | ||
| auth: | ||
| header: | ||
| Authorization: "Bearer ${{OPENAI_API_KEY}}" | ||
| options: | ||
| model: gpt-4o-mini | ||
| ai-rate-limiting: | ||
| instances: | ||
| - name: my-llm | ||
| limit: 100000 | ||
| time_window: 60 | ||
| limit_strategy: total_tokens | ||
| standard_headers: true # <-- enable standard headers | ||
| rejected_code: 429 | ||
| ``` | ||
|
|
||
| ## Response Headers | ||
|
|
||
| ### Normal request (quota available) | ||
|
|
||
| ``` | ||
| HTTP/1.1 200 OK | ||
| X-RateLimit-Limit-Tokens: 100000 | ||
| X-RateLimit-Remaining-Tokens: 99985 | ||
| X-RateLimit-Reset-Tokens: 42 | ||
| ``` | ||
|
|
||
| ### Rate-limited request (quota exhausted) | ||
|
|
||
| ``` | ||
| HTTP/1.1 429 Too Many Requests | ||
| X-RateLimit-Limit-Tokens: 100000 | ||
| X-RateLimit-Remaining-Tokens: 0 | ||
| X-RateLimit-Reset-Tokens: 18 | ||
| ``` | ||
|
|
||
| ### With `limit_strategy: prompt_tokens` | ||
|
|
||
| ``` | ||
| HTTP/1.1 200 OK | ||
| X-RateLimit-Limit-PromptTokens: 50000 | ||
| X-RateLimit-Remaining-PromptTokens: 49990 | ||
| X-RateLimit-Reset-PromptTokens: 55 | ||
| ``` | ||
|
|
||
| ## Backward Compatibility | ||
|
|
||
| Setting `standard_headers: false` (or omitting it) preserves the original | ||
| `X-AI-RateLimit-Limit-{instance_name}` header format, so existing integrations | ||
| are unaffected. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes appear to be unrelated to this PR; please split them into separate PRs.