OpenAI-compatible reverse proxy for Azure AI Foundry / Azure OpenAI with SSE streaming, configurable Caddy TLS, and deployment-selectable persistence.
English | 简体中文 | Docs Index
- OpenAI-compatible proxy for
chat/completions,responses,images/generations, andmodels - Client -> Proxy uses API key auth via
Authorization: Bearerorx-api-key - Proxy -> Azure AI Foundry / Azure OpenAI uses AAD tokens or
api-key, based onauth.mode - Static admin page for config editing, AAD verification, model usage stats, and recent log inspection
- Model-level route overrides via
models[].routesand upstream route maps viaupstreams[].routes
- Bicep template: infra/main.bicep
- ARM template for portal deployment: infra/azuredeploy.json
- Portal UI definition for managed app / custom portal packaging: infra/createUiDefinition.json
- Azure Managed Application package source: infra/azure_deployment_with_UI
- Example parameters: infra/parameters/dev.json, infra/parameters/prod.json
The Deploy to Azure button targets the ARM JSON template because the portal button flow does not deploy remote Bicep files directly.
The standard raw-template Deploy to Azure button does not automatically use createUiDefinition.json; that file is intended for portal packaging flows that support a custom create experience.
This repo now supports deployment-time persistence selection.
- Keeps the current ACI + Azure Files mount to
/app/data - Best fit when you need filesystem-style persistence for config, Caddyfile, and Caddy state
- Still requires storage account key for the ACI mount itself
- Keeps config persistence at the application layer through Blob SDK
- Uses
DefaultAzureCredentialand managed identity to read and write the config blob - Does not replace Azure Files mount semantics for
/app/data - If Blob access is not ready yet, startup falls back to the local cached config at
/app/data/config.json - While Blob access is degraded, config writes continue to the local file and are retried to Blob in the background
- After RBAC propagation completes, the app automatically switches back to Blob-backed persistence without requiring a restart
- Background Blob recovery checks run every
30000ms by default and can be tuned withBLOB_RECOVERY_INTERVAL_MS
ACI native Azure Files mounting still depends on shared key authentication. Managed identity can be used for Blob SDK operations, but it does not convert Azure Files volume mounting into an AAD-only flow. If you must disable key-based auth and still need /app/data mount semantics, move to another platform such as ACA, AKS, or a VM-based deployment.
The proxy now uses a more conservative long-response baseline that is better suited for tool-calling and MCP-style workflows.
"server": {
"upstream": {
"connectTimeoutMs": 10000,
"requestTimeoutMs": 900000,
"firstByteTimeoutMs": 300000,
"idleTimeoutMs": 900000,
"maxRetries": 0,
"retryBaseMs": 800,
"retryMaxMs": 8000,
"pool": {
"connections": 64,
"keepAliveTimeoutMs": 60000,
"keepAliveMaxTimeoutMs": 300000,
"headersTimeoutMs": 300000,
"bodyTimeoutMs": 0,
"pipelining": 1
}
},
"caddy": {
"transport": {
"dialTimeoutMs": 5000,
"responseHeaderTimeoutMs": 45000,
"keepAliveTimeoutMs": 120000
}
}
}Guidance:
- Keep
server.caddy.transport.dialTimeoutMsaligned withserver.upstream.connectTimeoutMs - Keep
server.caddy.transport.responseHeaderTimeoutMsgreater than or equal toserver.upstream.firstByteTimeoutMs - Keep
server.upstream.idleTimeoutMslong enough for SSE streams that pause between events - Tune
server.upstream.poolfirst for latency-sensitive, low-concurrency deployments before changing retry budgets - For MCP or tool-calling flows, prefer longer
firstByteTimeoutMsandidleTimeoutMs, but keepmaxRetrieslow to avoid replaying side-effecting tool calls
- Copy the sample config:
cp config/sample_config.json config/config.json
- Edit
config/config.json:- Replace
upstreams[].baseUrlwith your Foundry or Azure OpenAI endpoint - Set
models[].targetModelto the deployment identifier
- Replace
- Choose upstream auth:
auth.mode = "servicePrincipal"withscope, plus service principal fields or managed identityauth.mode = "apiKey"withauth.apiKey
- Replace the default API key and admin credentials
- Install dependencies and start:
npm installnpm run start
CONFIG_PATH: local cached config path, default./config/config.jsonBODY_LIMIT: request body limit in bytes, default52428800CADDY_BIN: optional Caddy binary path overrideADMIN_LOG_BUFFER_SIZE: in-memory admin log ring buffer size, default1000
Config file values under server.upstream.pool are primary. These environment variables can still override them when needed:
UPSTREAM_MAX_CONNECTIONSUPSTREAM_KEEPALIVE_TIMEOUT_MSUPSTREAM_KEEPALIVE_MAX_TIMEOUT_MSUPSTREAM_HEADERS_TIMEOUT_MSUPSTREAM_BODY_TIMEOUT_MSUPSTREAM_PIPELINING
PERSISTENCE_MODE=azureFile|blobAZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.netCONFIG_BLOB_CONTAINER=<container-name>CONFIG_BLOB_NAME=config/config.jsonBLOB_RECOVERY_INTERVAL_MS=30000to control how often the app retries Blob access after falling back to the local cache
In blob mode, the app reads from Blob first and falls back to the local cached config if the blob is not present yet.
Open /admin to manage config.
The admin page now exposes:
- Top-level status cards for proxy health, AAD verification status, config state, and runtime state
- Config dirty-state badges, basic structure reminders, and a local diff preview before save
- Caddy dial timeout
- Caddy response header timeout
- Caddy keepalive timeout
- Runtime persistence summary so you can see whether the deployment is using
azureFileorblob - Recent logs with level filters (
warn,error, optionalinfo), keyword search, request-id filtering, and copy-summary actions
The log panel uses a compact always-visible toolbar plus collapsible advanced filters instead of a sticky filter bar.
Controlled by server.adminAuth. When enabled, it protects /admin and /admin/api/* with HTTP Basic auth.
- Stats are in-memory only; restart resets counters
usageis collected from non-stream JSON responses and streaming SSE usage events- Cached token fields from upstream are counted when present
- The proxy preserves
stream_optionsfor streamingchat/completionsandresponsesrequests, and strips it for other routes where Foundry v1 may reject it
- Admin logs are stored in an in-memory ring buffer; restart clears them
- Default retention is the most recent
1000entries and can be tuned withADMIN_LOG_BUFFER_SIZE - Log records are sanitized for common sensitive keys and large strings are truncated before entering the admin buffer
- The admin page is intended for recent troubleshooting, not long-term audit retention
Build:
docker build -t aoai-proxy:latest .
Run with Azure Files-style local persistence:
docker run --rm -p 3000:3000 -p 443:443 -v $(pwd)/data:/app/data aoai-proxy:latest
Run with Blob-backed config persistence:
docker run --rm -p 3000:3000 -p 443:443 \
-e PERSISTENCE_MODE=blob \
-e AZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.net \
-e CONFIG_BLOB_CONTAINER=aoai-proxy-config \
-e CONFIG_BLOB_NAME=config/config.json \
aoai-proxy:latestThe container still uses DefaultAzureCredential, so provide service principal credentials for local development or a managed identity in Azure.
- Default mode
- Uses a client secret when
tenantId,clientId, andclientSecretare provided - Otherwise falls back to
DefaultAzureCredential, including managed identity when available - Requires
auth.scope
- Sends requests to Azure AI Foundry / Azure OpenAI with the
api-keyheader - Requires
auth.apiKey - Does not acquire AAD tokens or use
auth.scope
az deployment group create \
--resource-group <rg> \
--template-file infra/main.bicep \
--parameters @infra/parameters/dev.jsonaz deployment group create \
--resource-group <rg> \
--template-file infra/azuredeploy.json \
--parameters @infra/parameters/prod.jsonThe templates provision:
- A container group with system-assigned managed identity
- A new storage account
- Azure Files share when
persistenceMode=azureFile - Blob container when
persistenceMode=blob - RBAC assignment for
Storage Blob Data Contributoron the blob container for both the container identity and the current deployment principal when blob mode is enabled - RBAC assignment for
Cognitive Services OpenAI Useron the target Azure OpenAI resource
The target Azure OpenAI / Foundry resource can live in a different resource group within the same subscription. Set cognitiveServicesAccountResourceGroup when it differs from the deployment resource group.
The storageAccountName parameter is the name of a new storage account to create. The current templates do not support selecting or reusing an existing storage account.
Use infra/azure_deployment_with_UI when you want the Azure Portal to use createUiDefinition.json and show the richer resource-selection UI.
Package the files so that mainTemplate.json and createUiDefinition.json are at the root of the zip:
cd infra/azure_deployment_with_UI
zip -j app.zip mainTemplate.json createUiDefinition.jsonPublish a service catalog definition with Azure CLI using either the local files or an uploaded package URI. Example with local files:
az managedapp definition create \
--resource-group <definition-rg> \
--name aoai-proxy-managedapp \
--location <location> \
--display-name "AOAI Foundry Proxy" \
--description "AOAI Foundry Proxy with custom UI for ACI deployment" \
--lock-level ReadOnly \
--authorizations <principalId>:<roleDefinitionId> \
--create-ui-definition @infra/azure_deployment_with_UI/createUiDefinition.json \
--main-template @infra/azure_deployment_with_UI/mainTemplate.jsonRetrieve the definition ID:
az managedapp definition show \
--resource-group <definition-rg> \
--name aoai-proxy-managedapp \
--query id -o tsvDeploy a service catalog instance:
az managedapp create \
--resource-group <application-rg> \
--name aoai-proxy-instance \
--location <location> \
--kind ServiceCatalog \
--managed-rg-id /subscriptions/<subscription-id>/resourceGroups/<managed-rg-name> \
--managedapp-definition-id <definition-id>The portal UI in this package supports selecting an existing Foundry or Azure OpenAI resource by resource picker and passes the selected resource group to the deployment template.
- Azure Files walkthrough: docs/aci_persist_vol.en.md
- Chinese version: docs/aci_persist_vol.md
Use the admin page to configure domain, email, upstream, and transport timeouts. Saving config regenerates the Caddyfile and attempts a hot reload.
When Caddy is already enabled and the container restarts, the app now treats the early boot period as starting instead of an error and retries the local caddy reload probe in the background until the Caddy process is ready.
If active health checks are enabled and /healthz is API-key protected, add a health header in Caddy or disable health_uri to avoid false 401/503 failures.
- Data plane path is
/openai/v1/* api-versionis optional; default behavior is v1- Request
modelmust be the deployment identifier
For gpt-5 and newer models, plus o* reasoning models, the proxy now applies a small set of request normalizations before forwarding to Foundry:
max_tokensis upgraded tomax_completion_tokensforchat/completionstop_logprobsimplieslogprobs: truewhen the client omits itreasoning_effortandreasoning.effortacceptlow,medium, andhigh;xhighis downgraded tohighservice_tier,verbosity, andtop_kare stripped for modern models because they are common sources ofunknown_parametererrors against Foundryweb_search_previewtools are rejected early with a400because Azure Foundry does not currently support web search tools
The proxy also keeps stream_options for streaming chat/completions and responses requests, and strips it only for routes where Foundry v1 may reject it.
Use models[].routes when the client-facing route and backend-supported route differ.
{
"models": [
{
"id": "my-model",
"upstream": "foundry",
"targetModel": "my-deployment",
"routes": {
"chat/completions": "responses"
}
}
]
}List models:
curl -sS http://127.0.0.1:3000/v1/models -H 'authorization: Bearer CHANGEME' | jq .
Chat request:
curl -sS http://127.0.0.1:3000/v1/chat/completions -H 'content-type: application/json' -H 'authorization: Bearer CHANGEME' -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"ping"}]}' | jq .