A small standalone rate limiting microservice for internal and edge-facing APIs.
It provides:
- Token Bucket limits (atomic, Redis Lua script) — good for requests-per-second / requests-per-minute with bursts.
- Optional Concurrency Leases — caps simultaneous in-flight work for expensive endpoints (uploads, media processing, paid providers).
This is designed to sit in front of services like an S3-compatible proxy (/proxy) or media/image analysis endpoints (/analyze), where uncontrolled traffic can burn money or melt CPU.
- Centralized rules and auditing for multiple microservices
- Keeps application code thin (call the limiter, return
429if blocked) - Redis-backed atomicity and consistent behavior across replicas
- Easy to put behind an API gateway / Ingress Controller and still keep per-route rules here
Each key has a “bucket” of tokens:
- limit: how many tokens are refilled per
period_seconds - burst: bucket capacity (how many you can spend instantly)
- cost: how many tokens a single request consumes (default
1.0)
If the bucket doesn’t have enough tokens, the request is denied and you get retry_after_ms.
For heavy endpoints, request-rate limiting is not enough. Concurrency leases cap:
- “Only N jobs can run at the same time per key (and optionally per route).”
- A lease has a time-to-live (TTL) so crashes don’t leak slots forever.
The limiter needs a key that represents the caller identity:
- Preferred: API key / Service ID
- Fallback: Client IP (only if enabled; can be unfair behind Network Address Translation)
Resolution order:
AllowRequest.keyin the JSON body (best, explicit)X-Api-KeyheaderX-Service-Idheaderip:<client-ip>ifFALLBACK_TO_IP=true
Policies are loaded from a JSON file (POLICY_PATH).
Example structure:
{
"default": {
"limit": 120,
"period_seconds": 60,
"burst": 120,
"scope": "key_route"
},
"rules": [
{
"name": "s3_proxy_write",
"methods": ["PUT", "POST", "DELETE"],
"path_prefix": "/proxy",
"limit": 10,
"period_seconds": 1,
"burst": 20,
"scope": "key_route"
},
{
"name": "heavy_analyze",
"methods": ["POST"],
"path_prefix": "/analyze",
"limit": 6,
"period_seconds": 60,
"burst": 6,
"scope": "key_route",
"concurrency": { "limit": 2, "ttl_seconds": 120 }
}
],
"bypass_keys": ["internal-admin"]
}key— limits per key, regardless of routekey_route— limits per key + HTTP method + path
Simple health check.
Token bucket decision (and optional concurrency decision).
Request
{
"key": "svc:video-worker",
"method": "PUT",
"path": "/proxy/bucket/object",
"cost": 1.0,
"tags": {}
}Response
{
"allowed": true,
"policy": "s3_proxy_write",
"limit": 10,
"period_seconds": 1,
"burst": 20,
"remaining_tokens": 17.0,
"retry_after_ms": null,
"reset_after_ms": 300,
"concurrency_allowed": null,
"concurrency_limit": null,
"lease_id": null,
"lease_ttl_seconds": null
}Important note about concurrency inside /v1/allow
If a rule has concurrency, the current implementation applies token bucket first and concurrency second.
That can “spend” a token even if concurrency denies the request.
If you need strict correctness for concurrency, use the dedicated lease endpoints below.
Strict concurrency gate: acquire a lease before doing expensive work.
Request
{
"key": "svc:analyzer",
"method": "POST",
"path": "/analyze",
"ttl_seconds": 120
}Response
{
"allowed": true,
"lease_id": "ab12cd34...",
"lease_ttl_seconds": 120,
"limit": 2,
"retry_after_ms": null
}Release a lease when work finishes.
Request
{
"lease_id": "ab12cd34...",
"key": "svc:analyzer",
"method": "POST",
"path": "/analyze"
}Response
{ "released": true }Optional shared-secret authentication:
- Set
AUTH_TOKENto a non-empty value - Send header:
X-RL-Auth: <AUTH_TOKEN>
If AUTH_TOKEN is empty, no auth is enforced.
Production advice
- Run this service only on internal networks whenever possible
- Prefer mutual Transport Layer Security (mTLS) or an internal gateway
- Do not expose Redis publicly
| Variable | Default | Description |
|---|---|---|
REDIS_URL |
redis://localhost:6379/0 |
Redis connection string |
AUTH_TOKEN |
empty | Shared secret for X-RL-Auth |
POLICY_PATH |
policies.json |
Policy file path inside the container |
KEY_PREFIX |
rl |
Prefix for Redis keys |
FALLBACK_TO_IP |
true |
Use client IP as key if no explicit key is provided |
TRUST_X_FORWARDED_FOR |
true |
Use X-Forwarded-For to determine client IP |
- Copy examples:
cp .env.example .env
cp policies.example.json policies.json- Start:
docker compose up --build- Test:
curl -X POST http://localhost:8080/v1/allow \
-H 'Content-Type: application/json' \
-H 'X-RL-Auth: change-me' \
-d '{"key":"svc:test","method":"GET","path":"/proxy/demo","cost":1}'Your service calls /v1/allow before executing the handler.
If denied, return:
- HTTP status
429 Too Many Requests Retry-Afterheader based onretry_after_ms(rounded up to seconds)
Python example (async)
import httpx
import math
async def check_rate_limit(key: str, method: str, path: str) -> None:
async with httpx.AsyncClient(timeout=1.0) as client:
r = await client.post(
"http://rate-limiter:8080/v1/allow",
headers={"X-RL-Auth": "change-me"},
json={"key": key, "method": method, "path": path, "cost": 1.0},
)
r.raise_for_status()
data = r.json()
if not data["allowed"]:
retry_ms = data.get("retry_after_ms") or 1000
retry_s = max(1, math.ceil(retry_ms / 1000))
raise RuntimeError(f"Rate limited. Retry after {retry_s}s")Use this when you have expensive in-flight work (file encryption, large uploads, image/video analysis, paid APIs).
Flow:
lease/acquire- run job
lease/releaseinfinally
-
Token bucket state is stored as a Redis hash:
- fields:
tokens,ts(timestamp) - auto-expire is set to 10 minutes (configurable in code if needed)
- fields:
-
Concurrency leases are stored in a Redis sorted set:
- member:
lease_id - score: timestamp
- old entries are purged based on TTL
- member:
- If Redis is down: requests will fail (you should decide whether your caller fails open or closed).
- For expensive endpoints, fail closed is safer.
- For internal “must stay up” endpoints, you might prefer fail open plus a hard concurrency cap in-process.
- Add access logs at your gateway
- Consider exporting metrics (Prometheus) if you want dashboards (not included in this minimal build)