feat: ai rate limiting standard headers by iakuf · Pull Request #13049 · apache/apisix

iakuf · 2026-02-28T10:09:23Z

PR: feat(ai-rate-limiting): add `standard_headers` option for OpenAI/OpenRouter-compatible rate-limit headers

Summary

Add a standard_headers boolean option to the ai-rate-limiting plugin.
When enabled, the plugin emits rate-limit response headers that follow the
OpenAI / OpenRouter convention, allowing IDE extensions (Cursor, Continue, etc.)
to detect quota exhaustion and apply automatic back-off without any custom
client-side configuration.

Issue / Motivation

The current ai-rate-limiting plugin outputs headers in the format:

X-AI-RateLimit-Limit-{instance_name}
X-AI-RateLimit-Remaining-{instance_name}
X-AI-RateLimit-Reset-{instance_name}

This format is APISIX-specific and not recognized by popular AI IDE extensions
such as Cursor and Continue. These tools look for the OpenAI/OpenRouter
standard headers:

X-RateLimit-Limit-Tokens
X-RateLimit-Remaining-Tokens
X-RateLimit-Reset-Tokens

Without these headers, IDE extensions cannot detect that they are being
rate-limited and will keep retrying immediately, causing a poor developer
experience and wasting quota.

Changes

`apisix/plugins/ai-rate-limiting.lua`

Added standard_headers field to the JSON Schema (boolean, default false).
In transform_limit_conf(): when standard_headers is true, the
limit_header, remaining_header, and reset_header fields passed to
limit-count are set to the standard names with a suffix derived from
limit_strategy:

limit_strategy Suffix

total_tokens Tokens

prompt_tokens PromptTokens

completion_tokens CompletionTokens
When standard_headers is false (default), the original
X-AI-RateLimit-*-{instance_name} headers are used — fully backward
compatible.

New / updated files

File	Description
`apisix/plugins/ai-rate-limiting.lua`	Core change
`t/plugin/ai-rate-limiting-standard-headers.t`	Test::Nginx test suite
`docs/en/latest/plugins/ai-rate-limiting.md`	Documentation update (see patch file)

Test Cases

The new test file t/plugin/ai-rate-limiting-standard-headers.t covers:

Schema check — standard_headers: true is accepted by check_schema.
Schema default — standard_headers defaults to false.
Standard headers present — a normal request with standard_headers: true
returns all three X-RateLimit-*-Tokens headers with numeric values.
429 Remaining = 0 — when the quota is exhausted the 429 response carries
X-RateLimit-Remaining-Tokens: 0.
prompt_tokens suffix — limit_strategy: prompt_tokens produces
X-RateLimit-*-PromptTokens headers.
completion_tokens suffix — limit_strategy: completion_tokens produces
X-RateLimit-*-CompletionTokens headers.
Backward compatibility — standard_headers: false still produces the
legacy X-AI-RateLimit-*-{instance_name} headers.

Running the tests locally

# Copy sources to Linux filesystem (required for unix socket support)
rm -rf /tmp/apisix-test
cp -r /path/to/apisix /tmp/apisix-test

# Run the new test file
docker run --rm --user root \
  -v /tmp/apisix-test:/usr/local/apisix/apisix-src \
  apache/apisix:3.15.0-debian bash -c '
    apt-get update -qq && apt-get install -y --no-install-recommends cpanminus git make libwww-perl &&
    cpanm --notest Test::Nginx &&
    git clone --depth=1 https://github.com/openresty/test-nginx.git /test-nginx &&
    ln -sf /usr/local/apisix/deps /usr/local/apisix/apisix-src/deps &&
    cd /usr/local/apisix/apisix-src &&
    APISIX_HOME=/usr/local/apisix/apisix-src TEST_NGINX_BINARY=/usr/bin/openresty \
    prove -I/test-nginx/lib -I./ t/plugin/ai-rate-limiting-standard-headers.t
  '

Documentation

See docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md for the
parameter reference table, configuration example, and sample response headers.

Checklist

New feature is backward compatible (standard_headers defaults to false)
JSON Schema updated with new field
Test::Nginx tests added (7 test cases)
Documentation written
CHANGELOG entry (to be added before merge)
CI passes

When using openai-compatible provider with Anthropic-format endpoints (e.g. DeepSeek's /anthropic/v1/messages), the response returns input_tokens/output_tokens instead of prompt_tokens/completion_tokens. This patch adds fallback support for both field names in both streaming and non-streaming paths, so token usage statistics work correctly regardless of which format the upstream LLM returns. Fixes token stats being 0 when proxying to Anthropic-compatible endpoints.

…streaming path

…uter-compatible rate-limit headers

Baoyuantop · 2026-03-02T10:41:12Z

apisix/plugins/ai-drivers/openai-base.lua

                        core.log.info("got token usage from ai service: ",
                                            core.json.delay_encode(data.usage))
                        ctx.llm_raw_usage = data.usage
+                        local pt = data.usage.prompt_tokens or data.usage.input_tokens or 0


These changes appear to be unrelated to this PR; please split them into separate PRs.

Baoyuantop · 2026-03-02T10:42:38Z

apisix/plugins/ai-rate-limiting.lua

            default = "total_tokens",
            description = "The strategy to limit the tokens"
        },
+        -- 使用 OpenRouter/OpenAI 兼容的标准头名，IDE 插件（Cursor/Continue）可直接识别


Please use English comments throughout.

Copilot

Pull request overview

Adds an opt-in standard_headers flag to ai-rate-limiting so it can emit OpenAI/OpenRouter-style X-RateLimit-* response headers (for better client/IDE backoff behavior), along with tests and supporting documentation notes.

Changes:

Add standard_headers schema option and map limit_strategy to X-RateLimit-*-{Tokens|PromptTokens|CompletionTokens} header names.
Add a new Test::Nginx suite validating standard header behavior, suffix mapping, and backward compatibility.
Extend the OpenAI driver token usage parsing to support input_tokens/output_tokens with total_tokens fallback, and add a documentation patch file describing standard_headers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`apisix/plugins/ai-rate-limiting.lua`	Adds `standard_headers` option and header-name switching logic in `transform_limit_conf()`.
`t/plugin/ai-rate-limiting-standard-headers.t`	New end-to-end tests covering standard headers + legacy headers behavior.
`docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md`	New documentation content describing `standard_headers` and expected response headers.
`apisix/plugins/ai-drivers/openai-base.lua`	Token usage parsing expanded to accept `input_tokens`/`output_tokens` and derive `total_tokens`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T10:51:43Z

docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md

+# ai-rate-limiting — `standard_headers` Parameter
+
+## Overview
+
+The `standard_headers` option makes `ai-rate-limiting` emit rate-limit response
+headers that follow the [OpenRouter / OpenAI convention][openrouter-headers],
+so IDE extensions such as **Cursor** and **Continue** can detect quota exhaustion
+and apply automatic back-off without any custom configuration.
+
+[openrouter-headers]: https://openrouter.ai/docs/api-reference/limits
+
+## New Parameter
+
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `standard_headers` | boolean | `false` | When `true`, emit OpenAI/OpenRouter-compatible rate-limit headers instead of the legacy `X-AI-RateLimit-*` headers. |
+


This file documents standard_headers, but the main plugin documentation (docs/en/latest/plugins/ai-rate-limiting.md) is still the canonical reference and currently has no mention of standard_headers (and its header names) while it explicitly documents X-AI-RateLimit-*. Please update ai-rate-limiting.md directly (and consider removing this “patch” file or clearly integrating/linking it) so users can discover the option in the standard docs.

Copilot · 2026-03-02T10:51:43Z

apisix/plugins/ai-drivers/openai-base.lua

+                        local pt = data.usage.prompt_tokens or data.usage.input_tokens or 0
+                        local ct = data.usage.completion_tokens or data.usage.output_tokens or 0
                        ctx.ai_token_usage = {
-                            prompt_tokens = data.usage.prompt_tokens or 0,
-                            completion_tokens = data.usage.completion_tokens or 0,
-                            total_tokens = data.usage.total_tokens or 0,
+                            prompt_tokens = pt,
+                            completion_tokens = ct,
+                            total_tokens = data.usage.total_tokens or (pt + ct),


This PR introduces support for input_tokens/output_tokens (and a derived total_tokens) in the OpenAI driver, but that behavior change isn’t mentioned in the PR description. Please either update the PR description/scope to include it or move this change into a separate PR; also consider adding/adjusting tests that cover responses using input_tokens/output_tokens to prevent regressions.

Copilot · 2026-03-02T10:51:44Z

t/plugin/ai-rate-limiting-standard-headers.t

+            -- First request: should succeed and consume the 1-token budget
+            local res1, err = httpc:request_uri("http://127.0.0.1:" .. ngx.var.server_port .. "/anything", {
+                method = "POST",
+                headers = {
+                    ["Content-Type"] = "application/json",
+                    ["apikey"] = "test-key-123",
+                },
+                body = [[{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}]],


The comment says the first request will “consume the 1-token budget”, but the mocked upstream returns total_tokens: 15. This can confuse future readers when interpreting why the second request is rate-limited. Consider rewording the comment to reflect that the first request exceeds the configured limit and drives Remaining to 0 for subsequent requests.

Copilot · 2026-03-02T10:51:44Z

apisix/plugins/ai-rate-limiting.lua

+    -- standard_headers=true 输出 OpenRouter/OpenAI 兼容头名
+    -- IDE 插件（Cursor/Continue）可直接识别并做退避
+    if plugin_conf.standard_headers then
+        local strategy = plugin_conf.limit_strategy or "total_tokens"
+        local suffix = strategy == "total_tokens"      and "Tokens"
+                    or strategy == "prompt_tokens"     and "PromptTokens"
+                    or "CompletionTokens"
+        limit_conf.limit_header     = "X-RateLimit-Limit-"     .. suffix
+        limit_conf.remaining_header = "X-RateLimit-Remaining-" .. suffix
+        limit_conf.reset_header     = "X-RateLimit-Reset-"     .. suffix
+    else
+        limit_conf.limit_header     = "X-AI-RateLimit-Limit-"     .. name
+        limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
+        limit_conf.reset_header     = "X-AI-RateLimit-Reset-"     .. name
+    end


The newly added standard_headers option does not take effect when conf.rules is configured. transform_limit_conf() returns early for rules, and limit-count always sets rule.header_prefix (index if unset), which forces X-{prefix}-RateLimit-* header names and ignores conf.limit_header/remaining_header/reset_header. Consider either (1) rejecting standard_headers=true when rules is set (schema/custom validation) and documenting the limitation, or (2) reworking the rules path so standard header names can be emitted.

Copilot · 2026-03-02T10:51:44Z

apisix/plugins/ai-rate-limiting.lua

+        -- 使用 OpenRouter/OpenAI 兼容的标准头名，IDE 插件（Cursor/Continue）可直接识别
+        -- true:  X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens
+        -- false: X-AI-RateLimit-Limit-{instance} (原有行为)
+        standard_headers = {type = "boolean", default = false},


The new inline comments for standard_headers are in Chinese, while the surrounding file and English docs use English. To keep the codebase consistent and accessible to all contributors, please translate these comments (or move the explanation into the schema description field in English).

Suggested change

-- 使用 OpenRouter/OpenAI 兼容的标准头名，IDE 插件（Cursor/Continue）可直接识别

-- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens

-- false: X-AI-RateLimit-Limit-{instance} (原有行为)

standard_headers = {type = "boolean", default = false},

standard_headers = {

type = "boolean",

default = false,

description = "Use OpenRouter/OpenAI-compatible standard rate limit header names (true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens; false: keep original behavior with X-AI-RateLimit-Limit-{instance})"

},

iakuf added 4 commits February 26, 2026 22:10

fix(ai-proxy): compute total_tokens fallback for Anthropic format in …

03608d9

…streaming path

fix(ai-proxy): also compute total_tokens fallback in non-streaming path

d88f360

feat(ai-rate-limiting): add standard_headers option for OpenAI/OpenRo…

526eee2

…uter-compatible rate-limit headers

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Feb 28, 2026

Baoyuantop changed the title ~~Feat/ai rate limiting standard headers~~ feat: ai rate limiting standard headers Mar 2, 2026

Baoyuantop reviewed Mar 2, 2026

View reviewed changes

Baoyuantop requested a review from Copilot March 2, 2026 10:43

Copilot started reviewing on behalf of Baoyuantop March 2, 2026 10:43 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ai rate limiting standard headers#13049

feat: ai rate limiting standard headers#13049
iakuf wants to merge 4 commits intoapache:masterfrom
iakuf:feat/ai-rate-limiting-standard-headers

iakuf commented Feb 28, 2026

Uh oh!

Baoyuantop Mar 2, 2026

Uh oh!

Baoyuantop Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`limit_strategy`	Suffix
`total_tokens`	`Tokens`
`prompt_tokens`	`PromptTokens`
`completion_tokens`	`CompletionTokens`

Conversation

iakuf commented Feb 28, 2026

PR: feat(ai-rate-limiting): add standard_headers option for OpenAI/OpenRouter-compatible rate-limit headers

Summary

Issue / Motivation

Changes

apisix/plugins/ai-rate-limiting.lua

New / updated files

Test Cases

Running the tests locally

Documentation

Checklist

Related

Uh oh!

Baoyuantop Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Baoyuantop Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PR: feat(ai-rate-limiting): add `standard_headers` option for OpenAI/OpenRouter-compatible rate-limit headers

`apisix/plugins/ai-rate-limiting.lua`