AOAI Foundry Proxy

OpenAI-compatible reverse proxy for Azure AI Foundry / Azure OpenAI with SSE streaming, configurable Caddy TLS, and deployment-selectable persistence.

English | 简体中文 | Docs Index

Overview

OpenAI-compatible proxy for chat/completions, responses, images/generations, and models
Client -> Proxy uses API key auth via Authorization: Bearer or x-api-key
Proxy -> Azure AI Foundry / Azure OpenAI uses AAD tokens or api-key, based on auth.mode
Static admin page for config editing, AAD verification, model usage stats, and recent log inspection
Model-level route overrides via models[].routes and upstream route maps via upstreams[].routes

Deployment Assets

Bicep template: infra/main.bicep
ARM template for portal deployment: infra/azuredeploy.json
Portal UI definition for managed app / custom portal packaging: infra/createUiDefinition.json
Azure Managed Application package source: infra/azure_deployment_with_UI
Example parameters: infra/parameters/dev.json, infra/parameters/prod.json

The Deploy to Azure button targets the ARM JSON template because the portal button flow does not deploy remote Bicep files directly. The standard raw-template Deploy to Azure button does not automatically use createUiDefinition.json; that file is intended for portal packaging flows that support a custom create experience.

Persistence Modes

This repo now supports deployment-time persistence selection.

`azureFile`

Keeps the current ACI + Azure Files mount to /app/data
Best fit when you need filesystem-style persistence for config, Caddyfile, and Caddy state
Still requires storage account key for the ACI mount itself

`blob`

Keeps config persistence at the application layer through Blob SDK
Uses DefaultAzureCredential and managed identity to read and write the config blob
Does not replace Azure Files mount semantics for /app/data
If Blob access is not ready yet, startup falls back to the local cached config at /app/data/config.json
While Blob access is degraded, config writes continue to the local file and are retried to Blob in the background
After RBAC propagation completes, the app automatically switches back to Blob-backed persistence without requiring a restart
Background Blob recovery checks run every 30000 ms by default and can be tuned with BLOB_RECOVERY_INTERVAL_MS

Deployment Constraint

ACI native Azure Files mounting still depends on shared key authentication. Managed identity can be used for Blob SDK operations, but it does not convert Azure Files volume mounting into an AAD-only flow. If you must disable key-based auth and still need /app/data mount semantics, move to another platform such as ACA, AKS, or a VM-based deployment.

Timeout Model

The proxy now uses a more conservative long-response baseline that is better suited for tool-calling and MCP-style workflows.

"server": {
  "upstream": {
    "connectTimeoutMs": 10000,
    "requestTimeoutMs": 900000,
    "firstByteTimeoutMs": 300000,
    "idleTimeoutMs": 900000,
    "maxRetries": 0,
    "retryBaseMs": 800,
    "retryMaxMs": 8000,
    "pool": {
      "connections": 64,
      "keepAliveTimeoutMs": 60000,
      "keepAliveMaxTimeoutMs": 300000,
      "headersTimeoutMs": 300000,
      "bodyTimeoutMs": 0,
      "pipelining": 1
    }
  },
  "caddy": {
    "transport": {
      "dialTimeoutMs": 5000,
      "responseHeaderTimeoutMs": 45000,
      "keepAliveTimeoutMs": 120000
    }
  }
}

Guidance:

Keep server.caddy.transport.dialTimeoutMs aligned with server.upstream.connectTimeoutMs
Keep server.caddy.transport.responseHeaderTimeoutMs greater than or equal to server.upstream.firstByteTimeoutMs
Keep server.upstream.idleTimeoutMs long enough for SSE streams that pause between events
Tune server.upstream.pool first for latency-sensitive, low-concurrency deployments before changing retry budgets
For MCP or tool-calling flows, prefer longer firstByteTimeoutMs and idleTimeoutMs, but keep maxRetries low to avoid replaying side-effecting tool calls

Local Run

Copy the sample config:
- cp config/sample_config.json config/config.json
Edit config/config.json:
- Replace upstreams[].baseUrl with your Foundry or Azure OpenAI endpoint
- Set models[].targetModel to the deployment identifier

Choose upstream auth:
- auth.mode = "servicePrincipal" with scope, plus service principal fields or managed identity
- auth.mode = "apiKey" with auth.apiKey
Replace the default API key and admin credentials

Install dependencies and start:
- npm install
- npm run start

Environment Variables

General

CONFIG_PATH: local cached config path, default ./config/config.json
BODY_LIMIT: request body limit in bytes, default 52428800
CADDY_BIN: optional Caddy binary path override
ADMIN_LOG_BUFFER_SIZE: in-memory admin log ring buffer size, default 1000

Optional Upstream Pool Overrides

Config file values under server.upstream.pool are primary. These environment variables can still override them when needed:

UPSTREAM_MAX_CONNECTIONS
UPSTREAM_KEEPALIVE_TIMEOUT_MS
UPSTREAM_KEEPALIVE_MAX_TIMEOUT_MS
UPSTREAM_HEADERS_TIMEOUT_MS
UPSTREAM_BODY_TIMEOUT_MS
UPSTREAM_PIPELINING

Persistence Selection

PERSISTENCE_MODE=azureFile|blob
AZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.net
CONFIG_BLOB_CONTAINER=<container-name>
CONFIG_BLOB_NAME=config/config.json
BLOB_RECOVERY_INTERVAL_MS=30000 to control how often the app retries Blob access after falling back to the local cache

In blob mode, the app reads from Blob first and falls back to the local cached config if the blob is not present yet.

Admin Page

Open /admin to manage config.

The admin page now exposes:

Top-level status cards for proxy health, AAD verification status, config state, and runtime state
Config dirty-state badges, basic structure reminders, and a local diff preview before save
Caddy dial timeout
Caddy response header timeout
Caddy keepalive timeout
Runtime persistence summary so you can see whether the deployment is using azureFile or blob
Recent logs with level filters (warn, error, optional info), keyword search, request-id filtering, and copy-summary actions

The log panel uses a compact always-visible toolbar plus collapsible advanced filters instead of a sticky filter bar.

Admin Login

Controlled by server.adminAuth. When enabled, it protects /admin and /admin/api/* with HTTP Basic auth.

Stats Notes

Stats are in-memory only; restart resets counters
usage is collected from non-stream JSON responses and streaming SSE usage events
Cached token fields from upstream are counted when present
The proxy preserves stream_options for streaming chat/completions and responses requests, and strips it for other routes where Foundry v1 may reject it

Log Notes

Admin logs are stored in an in-memory ring buffer; restart clears them
Default retention is the most recent 1000 entries and can be tuned with ADMIN_LOG_BUFFER_SIZE
Log records are sanitized for common sensitive keys and large strings are truncated before entering the admin buffer
The admin page is intended for recent troubleshooting, not long-term audit retention

Docker

Build:

docker build -t aoai-proxy:latest .

Run with Azure Files-style local persistence:

docker run --rm -p 3000:3000 -p 443:443 -v $(pwd)/data:/app/data aoai-proxy:latest

Run with Blob-backed config persistence:

docker run --rm -p 3000:3000 -p 443:443 \
  -e PERSISTENCE_MODE=blob \
  -e AZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.net \
  -e CONFIG_BLOB_CONTAINER=aoai-proxy-config \
  -e CONFIG_BLOB_NAME=config/config.json \
  aoai-proxy:latest

The container still uses DefaultAzureCredential, so provide service principal credentials for local development or a managed identity in Azure.

Upstream Auth Modes

`servicePrincipal`

Default mode
Uses a client secret when tenantId, clientId, and clientSecret are provided
Otherwise falls back to DefaultAzureCredential, including managed identity when available
Requires auth.scope

`apiKey`

Sends requests to Azure AI Foundry / Azure OpenAI with the api-key header
Requires auth.apiKey
Does not acquire AAD tokens or use auth.scope

Azure Deployment

Deploy with Bicep

az deployment group create \
  --resource-group <rg> \
  --template-file infra/main.bicep \
  --parameters @infra/parameters/dev.json

Deploy with ARM JSON

az deployment group create \
  --resource-group <rg> \
  --template-file infra/azuredeploy.json \
  --parameters @infra/parameters/prod.json

The templates provision:

A container group with system-assigned managed identity
A new storage account
Azure Files share when persistenceMode=azureFile
Blob container when persistenceMode=blob
RBAC assignment for Storage Blob Data Contributor on the blob container for both the container identity and the current deployment principal when blob mode is enabled
RBAC assignment for Cognitive Services OpenAI User on the target Azure OpenAI resource

The target Azure OpenAI / Foundry resource can live in a different resource group within the same subscription. Set cognitiveServicesAccountResourceGroup when it differs from the deployment resource group. The storageAccountName parameter is the name of a new storage account to create. The current templates do not support selecting or reusing an existing storage account.

Deploy as Azure Managed Application With Custom UI

Use infra/azure_deployment_with_UI when you want the Azure Portal to use createUiDefinition.json and show the richer resource-selection UI.

Package the files so that mainTemplate.json and createUiDefinition.json are at the root of the zip:

cd infra/azure_deployment_with_UI
zip -j app.zip mainTemplate.json createUiDefinition.json

Publish a service catalog definition with Azure CLI using either the local files or an uploaded package URI. Example with local files:

az managedapp definition create \
  --resource-group <definition-rg> \
  --name aoai-proxy-managedapp \
  --location <location> \
  --display-name "AOAI Foundry Proxy" \
  --description "AOAI Foundry Proxy with custom UI for ACI deployment" \
  --lock-level ReadOnly \
  --authorizations <principalId>:<roleDefinitionId> \
  --create-ui-definition @infra/azure_deployment_with_UI/createUiDefinition.json \
  --main-template @infra/azure_deployment_with_UI/mainTemplate.json

Retrieve the definition ID:

az managedapp definition show \
  --resource-group <definition-rg> \
  --name aoai-proxy-managedapp \
  --query id -o tsv

Deploy a service catalog instance:

az managedapp create \
  --resource-group <application-rg> \
  --name aoai-proxy-instance \
  --location <location> \
  --kind ServiceCatalog \
  --managed-rg-id /subscriptions/<subscription-id>/resourceGroups/<managed-rg-name> \
  --managedapp-definition-id <definition-id>

The portal UI in this package supports selecting an existing Foundry or Azure OpenAI resource by resource picker and passes the selected resource group to the deployment template.

ACI Persistence and RBAC

Azure Files walkthrough: docs/aci_persist_vol.en.md
Chinese version: docs/aci_persist_vol.md

Caddy TLS

Use the admin page to configure domain, email, upstream, and transport timeouts. Saving config regenerates the Caddyfile and attempts a hot reload.

When Caddy is already enabled and the container restarts, the app now treats the early boot period as starting instead of an error and retries the local caddy reload probe in the background until the Caddy process is ready.

If active health checks are enabled and /healthz is API-key protected, add a health header in Caddy or disable health_uri to avoid false 401/503 failures.

Foundry v1 Notes

Data plane path is /openai/v1/*
api-version is optional; default behavior is v1
Request model must be the deployment identifier

Modern Model Compatibility

For gpt-5 and newer models, plus o* reasoning models, the proxy now applies a small set of request normalizations before forwarding to Foundry:

max_tokens is upgraded to max_completion_tokens for chat/completions
top_logprobs implies logprobs: true when the client omits it
reasoning_effort and reasoning.effort accept low, medium, and high; xhigh is downgraded to high
service_tier, verbosity, and top_k are stripped for modern models because they are common sources of unknown_parameter errors against Foundry
web_search_preview tools are rejected early with a 400 because Azure Foundry does not currently support web search tools

The proxy also keeps stream_options for streaming chat/completions and responses requests, and strips it only for routes where Foundry v1 may reject it.

Model Route Overrides

Use models[].routes when the client-facing route and backend-supported route differ.

{
  "models": [
    {
      "id": "my-model",
      "upstream": "foundry",
      "targetModel": "my-deployment",
      "routes": {
        "chat/completions": "responses"
      }
    }
  ]
}

curl Examples

List models:

curl -sS http://127.0.0.1:3000/v1/models -H 'authorization: Bearer CHANGEME' | jq .

Chat request:

curl -sS http://127.0.0.1:3000/v1/chat/completions -H 'content-type: application/json' -H 'authorization: Bearer CHANGEME' -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"ping"}]}' | jq .

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
config		config
docker		docker
docs		docs
infra		infra
public		public
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
aci.yaml		aci.yaml
dockerbuild.sh		dockerbuild.sh
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

AOAI Foundry Proxy

Overview

Deployment Assets

Persistence Modes

azureFile

blob

Deployment Constraint

Timeout Model

Local Run

Environment Variables

General

Optional Upstream Pool Overrides

Persistence Selection

Admin Page

Admin Login

Stats Notes

Log Notes

Docker

Upstream Auth Modes

servicePrincipal

apiKey

Azure Deployment

Deploy with Bicep

Deploy with ARM JSON

Deploy as Azure Managed Application With Custom UI

ACI Persistence and RBAC

Caddy TLS

Foundry v1 Notes

Modern Model Compatibility

Model Route Overrides

curl Examples

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`azureFile`

`blob`

`servicePrincipal`

`apiKey`

Packages