Skip to content

feat: ai rate limiting standard headers#13049

Open
iakuf wants to merge 4 commits intoapache:masterfrom
iakuf:feat/ai-rate-limiting-standard-headers
Open

feat: ai rate limiting standard headers#13049
iakuf wants to merge 4 commits intoapache:masterfrom
iakuf:feat/ai-rate-limiting-standard-headers

Conversation

@iakuf
Copy link

@iakuf iakuf commented Feb 28, 2026

PR: feat(ai-rate-limiting): add standard_headers option for OpenAI/OpenRouter-compatible rate-limit headers

Summary

Add a standard_headers boolean option to the ai-rate-limiting plugin.
When enabled, the plugin emits rate-limit response headers that follow the
OpenAI / OpenRouter convention, allowing IDE extensions (Cursor, Continue, etc.)
to detect quota exhaustion and apply automatic back-off without any custom
client-side configuration.


Issue / Motivation

The current ai-rate-limiting plugin outputs headers in the format:

X-AI-RateLimit-Limit-{instance_name}
X-AI-RateLimit-Remaining-{instance_name}
X-AI-RateLimit-Reset-{instance_name}

This format is APISIX-specific and not recognized by popular AI IDE extensions
such as Cursor and Continue. These tools look for the OpenAI/OpenRouter
standard headers:

X-RateLimit-Limit-Tokens
X-RateLimit-Remaining-Tokens
X-RateLimit-Reset-Tokens

Without these headers, IDE extensions cannot detect that they are being
rate-limited and will keep retrying immediately, causing a poor developer
experience and wasting quota.


Changes

apisix/plugins/ai-rate-limiting.lua

  • Added standard_headers field to the JSON Schema (boolean, default false).

  • In transform_limit_conf(): when standard_headers is true, the
    limit_header, remaining_header, and reset_header fields passed to
    limit-count are set to the standard names with a suffix derived from
    limit_strategy:

    limit_strategy Suffix
    total_tokens Tokens
    prompt_tokens PromptTokens
    completion_tokens CompletionTokens
  • When standard_headers is false (default), the original
    X-AI-RateLimit-*-{instance_name} headers are used — fully backward
    compatible
    .

New / updated files

File Description
apisix/plugins/ai-rate-limiting.lua Core change
t/plugin/ai-rate-limiting-standard-headers.t Test::Nginx test suite
docs/en/latest/plugins/ai-rate-limiting.md Documentation update (see patch file)

Test Cases

The new test file t/plugin/ai-rate-limiting-standard-headers.t covers:

  1. Schema checkstandard_headers: true is accepted by check_schema.
  2. Schema defaultstandard_headers defaults to false.
  3. Standard headers present — a normal request with standard_headers: true
    returns all three X-RateLimit-*-Tokens headers with numeric values.
  4. 429 Remaining = 0 — when the quota is exhausted the 429 response carries
    X-RateLimit-Remaining-Tokens: 0.
  5. prompt_tokens suffixlimit_strategy: prompt_tokens produces
    X-RateLimit-*-PromptTokens headers.
  6. completion_tokens suffixlimit_strategy: completion_tokens produces
    X-RateLimit-*-CompletionTokens headers.
  7. Backward compatibilitystandard_headers: false still produces the
    legacy X-AI-RateLimit-*-{instance_name} headers.

Running the tests locally

# Copy sources to Linux filesystem (required for unix socket support)
rm -rf /tmp/apisix-test
cp -r /path/to/apisix /tmp/apisix-test

# Run the new test file
docker run --rm --user root \
  -v /tmp/apisix-test:/usr/local/apisix/apisix-src \
  apache/apisix:3.15.0-debian bash -c '
    apt-get update -qq && apt-get install -y --no-install-recommends cpanminus git make libwww-perl &&
    cpanm --notest Test::Nginx &&
    git clone --depth=1 https://github.com/openresty/test-nginx.git /test-nginx &&
    ln -sf /usr/local/apisix/deps /usr/local/apisix/apisix-src/deps &&
    cd /usr/local/apisix/apisix-src &&
    APISIX_HOME=/usr/local/apisix/apisix-src TEST_NGINX_BINARY=/usr/bin/openresty \
    prove -I/test-nginx/lib -I./ t/plugin/ai-rate-limiting-standard-headers.t
  '

Documentation

See docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md for the
parameter reference table, configuration example, and sample response headers.


Checklist

  • New feature is backward compatible (standard_headers defaults to false)
  • JSON Schema updated with new field
  • Test::Nginx tests added (7 test cases)
  • Documentation written
  • CHANGELOG entry (to be added before merge)
  • CI passes

Related

iakuf added 4 commits February 26, 2026 22:10
When using openai-compatible provider with Anthropic-format endpoints
(e.g. DeepSeek's /anthropic/v1/messages), the response returns
input_tokens/output_tokens instead of prompt_tokens/completion_tokens.

This patch adds fallback support for both field names in both
streaming and non-streaming paths, so token usage statistics work
correctly regardless of which format the upstream LLM returns.

Fixes token stats being 0 when proxying to Anthropic-compatible endpoints.
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Feb 28, 2026
@Baoyuantop Baoyuantop changed the title Feat/ai rate limiting standard headers feat: ai rate limiting standard headers Mar 2, 2026
core.log.info("got token usage from ai service: ",
core.json.delay_encode(data.usage))
ctx.llm_raw_usage = data.usage
local pt = data.usage.prompt_tokens or data.usage.input_tokens or 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes appear to be unrelated to this PR; please split them into separate PRs.

default = "total_tokens",
description = "The strategy to limit the tokens"
},
-- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English comments throughout.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in standard_headers flag to ai-rate-limiting so it can emit OpenAI/OpenRouter-style X-RateLimit-* response headers (for better client/IDE backoff behavior), along with tests and supporting documentation notes.

Changes:

  • Add standard_headers schema option and map limit_strategy to X-RateLimit-*-{Tokens|PromptTokens|CompletionTokens} header names.
  • Add a new Test::Nginx suite validating standard header behavior, suffix mapping, and backward compatibility.
  • Extend the OpenAI driver token usage parsing to support input_tokens/output_tokens with total_tokens fallback, and add a documentation patch file describing standard_headers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
apisix/plugins/ai-rate-limiting.lua Adds standard_headers option and header-name switching logic in transform_limit_conf().
t/plugin/ai-rate-limiting-standard-headers.t New end-to-end tests covering standard headers + legacy headers behavior.
docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md New documentation content describing standard_headers and expected response headers.
apisix/plugins/ai-drivers/openai-base.lua Token usage parsing expanded to accept input_tokens/output_tokens and derive total_tokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +17
# ai-rate-limiting — `standard_headers` Parameter

## Overview

The `standard_headers` option makes `ai-rate-limiting` emit rate-limit response
headers that follow the [OpenRouter / OpenAI convention][openrouter-headers],
so IDE extensions such as **Cursor** and **Continue** can detect quota exhaustion
and apply automatic back-off without any custom configuration.

[openrouter-headers]: https://openrouter.ai/docs/api-reference/limits

## New Parameter

| Parameter | Type | Default | Description |
|---|---|---|---|
| `standard_headers` | boolean | `false` | When `true`, emit OpenAI/OpenRouter-compatible rate-limit headers instead of the legacy `X-AI-RateLimit-*` headers. |

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file documents standard_headers, but the main plugin documentation (docs/en/latest/plugins/ai-rate-limiting.md) is still the canonical reference and currently has no mention of standard_headers (and its header names) while it explicitly documents X-AI-RateLimit-*. Please update ai-rate-limiting.md directly (and consider removing this “patch” file or clearly integrating/linking it) so users can discover the option in the standard docs.

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +139
local pt = data.usage.prompt_tokens or data.usage.input_tokens or 0
local ct = data.usage.completion_tokens or data.usage.output_tokens or 0
ctx.ai_token_usage = {
prompt_tokens = data.usage.prompt_tokens or 0,
completion_tokens = data.usage.completion_tokens or 0,
total_tokens = data.usage.total_tokens or 0,
prompt_tokens = pt,
completion_tokens = ct,
total_tokens = data.usage.total_tokens or (pt + ct),
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces support for input_tokens/output_tokens (and a derived total_tokens) in the OpenAI driver, but that behavior change isn’t mentioned in the PR description. Please either update the PR description/scope to include it or move this change into a separate PR; also consider adding/adjusting tests that cover responses using input_tokens/output_tokens to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +239
-- First request: should succeed and consume the 1-token budget
local res1, err = httpc:request_uri("http://127.0.0.1:" .. ngx.var.server_port .. "/anything", {
method = "POST",
headers = {
["Content-Type"] = "application/json",
["apikey"] = "test-key-123",
},
body = [[{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}]],
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the first request will “consume the 1-token budget”, but the mocked upstream returns total_tokens: 15. This can confuse future readers when interpreting why the second request is rate-limited. Consider rewording the comment to reflect that the first request exceeds the configured limit and drives Remaining to 0 for subsequent requests.

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +199
-- standard_headers=true 输出 OpenRouter/OpenAI 兼容头名
-- IDE 插件(Cursor/Continue)可直接识别并做退避
if plugin_conf.standard_headers then
local strategy = plugin_conf.limit_strategy or "total_tokens"
local suffix = strategy == "total_tokens" and "Tokens"
or strategy == "prompt_tokens" and "PromptTokens"
or "CompletionTokens"
limit_conf.limit_header = "X-RateLimit-Limit-" .. suffix
limit_conf.remaining_header = "X-RateLimit-Remaining-" .. suffix
limit_conf.reset_header = "X-RateLimit-Reset-" .. suffix
else
limit_conf.limit_header = "X-AI-RateLimit-Limit-" .. name
limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
limit_conf.reset_header = "X-AI-RateLimit-Reset-" .. name
end
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added standard_headers option does not take effect when conf.rules is configured. transform_limit_conf() returns early for rules, and limit-count always sets rule.header_prefix (index if unset), which forces X-{prefix}-RateLimit-* header names and ignores conf.limit_header/remaining_header/reset_header. Consider either (1) rejecting standard_headers=true when rules is set (schema/custom validation) and documenting the limitation, or (2) reworking the rules path so standard header names can be emitted.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +71
-- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别
-- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens
-- false: X-AI-RateLimit-Limit-{instance} (原有行为)
standard_headers = {type = "boolean", default = false},
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new inline comments for standard_headers are in Chinese, while the surrounding file and English docs use English. To keep the codebase consistent and accessible to all contributors, please translate these comments (or move the explanation into the schema description field in English).

Suggested change
-- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别
-- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens
-- false: X-AI-RateLimit-Limit-{instance} (原有行为)
standard_headers = {type = "boolean", default = false},
standard_headers = {
type = "boolean",
default = false,
description = "Use OpenRouter/OpenAI-compatible standard rate limit header names (true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens; false: keep original behavior with X-AI-RateLimit-Limit-{instance})"
},

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants