Skip to content

pczhao1210/AOAI-Proxy

Repository files navigation

AOAI Foundry Proxy

OpenAI-compatible reverse proxy for Azure AI Foundry / Azure OpenAI with SSE streaming, configurable Caddy TLS, and deployment-selectable persistence.

English | 简体中文 | Docs Index

Deploy to Azure

Overview

  • OpenAI-compatible proxy for chat/completions, responses, images/generations, and models
  • Client -> Proxy uses API key auth via Authorization: Bearer or x-api-key
  • Proxy -> Azure AI Foundry / Azure OpenAI uses AAD tokens or api-key, based on auth.mode
  • Static admin page for config editing, AAD verification, model usage stats, and recent log inspection
  • Model-level route overrides via models[].routes and upstream route maps via upstreams[].routes

Deployment Assets

The Deploy to Azure button targets the ARM JSON template because the portal button flow does not deploy remote Bicep files directly. The standard raw-template Deploy to Azure button does not automatically use createUiDefinition.json; that file is intended for portal packaging flows that support a custom create experience.

Persistence Modes

This repo now supports deployment-time persistence selection.

azureFile

  • Keeps the current ACI + Azure Files mount to /app/data
  • Best fit when you need filesystem-style persistence for config, Caddyfile, and Caddy state
  • Still requires storage account key for the ACI mount itself

blob

  • Keeps config persistence at the application layer through Blob SDK
  • Uses DefaultAzureCredential and managed identity to read and write the config blob
  • Does not replace Azure Files mount semantics for /app/data
  • If Blob access is not ready yet, startup falls back to the local cached config at /app/data/config.json
  • While Blob access is degraded, config writes continue to the local file and are retried to Blob in the background
  • After RBAC propagation completes, the app automatically switches back to Blob-backed persistence without requiring a restart
  • Background Blob recovery checks run every 30000 ms by default and can be tuned with BLOB_RECOVERY_INTERVAL_MS

Deployment Constraint

ACI native Azure Files mounting still depends on shared key authentication. Managed identity can be used for Blob SDK operations, but it does not convert Azure Files volume mounting into an AAD-only flow. If you must disable key-based auth and still need /app/data mount semantics, move to another platform such as ACA, AKS, or a VM-based deployment.

Timeout Model

The proxy now uses a more conservative long-response baseline that is better suited for tool-calling and MCP-style workflows.

"server": {
  "upstream": {
    "connectTimeoutMs": 10000,
    "requestTimeoutMs": 900000,
    "firstByteTimeoutMs": 300000,
    "idleTimeoutMs": 900000,
    "maxRetries": 0,
    "retryBaseMs": 800,
    "retryMaxMs": 8000,
    "pool": {
      "connections": 64,
      "keepAliveTimeoutMs": 60000,
      "keepAliveMaxTimeoutMs": 300000,
      "headersTimeoutMs": 300000,
      "bodyTimeoutMs": 0,
      "pipelining": 1
    }
  },
  "caddy": {
    "transport": {
      "dialTimeoutMs": 5000,
      "responseHeaderTimeoutMs": 45000,
      "keepAliveTimeoutMs": 120000
    }
  }
}

Guidance:

  • Keep server.caddy.transport.dialTimeoutMs aligned with server.upstream.connectTimeoutMs
  • Keep server.caddy.transport.responseHeaderTimeoutMs greater than or equal to server.upstream.firstByteTimeoutMs
  • Keep server.upstream.idleTimeoutMs long enough for SSE streams that pause between events
  • Tune server.upstream.pool first for latency-sensitive, low-concurrency deployments before changing retry budgets
  • For MCP or tool-calling flows, prefer longer firstByteTimeoutMs and idleTimeoutMs, but keep maxRetries low to avoid replaying side-effecting tool calls

Local Run

  1. Copy the sample config:
    • cp config/sample_config.json config/config.json
  2. Edit config/config.json:
    • Replace upstreams[].baseUrl with your Foundry or Azure OpenAI endpoint
    • Set models[].targetModel to the deployment identifier
  • Choose upstream auth:
    • auth.mode = "servicePrincipal" with scope, plus service principal fields or managed identity
    • auth.mode = "apiKey" with auth.apiKey
  • Replace the default API key and admin credentials
  1. Install dependencies and start:
    • npm install
    • npm run start

Environment Variables

General

  • CONFIG_PATH: local cached config path, default ./config/config.json
  • BODY_LIMIT: request body limit in bytes, default 52428800
  • CADDY_BIN: optional Caddy binary path override
  • ADMIN_LOG_BUFFER_SIZE: in-memory admin log ring buffer size, default 1000

Optional Upstream Pool Overrides

Config file values under server.upstream.pool are primary. These environment variables can still override them when needed:

  • UPSTREAM_MAX_CONNECTIONS
  • UPSTREAM_KEEPALIVE_TIMEOUT_MS
  • UPSTREAM_KEEPALIVE_MAX_TIMEOUT_MS
  • UPSTREAM_HEADERS_TIMEOUT_MS
  • UPSTREAM_BODY_TIMEOUT_MS
  • UPSTREAM_PIPELINING

Persistence Selection

  • PERSISTENCE_MODE=azureFile|blob
  • AZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.net
  • CONFIG_BLOB_CONTAINER=<container-name>
  • CONFIG_BLOB_NAME=config/config.json
  • BLOB_RECOVERY_INTERVAL_MS=30000 to control how often the app retries Blob access after falling back to the local cache

In blob mode, the app reads from Blob first and falls back to the local cached config if the blob is not present yet.

Admin Page

Open /admin to manage config.

The admin page now exposes:

  • Top-level status cards for proxy health, AAD verification status, config state, and runtime state
  • Config dirty-state badges, basic structure reminders, and a local diff preview before save
  • Caddy dial timeout
  • Caddy response header timeout
  • Caddy keepalive timeout
  • Runtime persistence summary so you can see whether the deployment is using azureFile or blob
  • Recent logs with level filters (warn, error, optional info), keyword search, request-id filtering, and copy-summary actions

The log panel uses a compact always-visible toolbar plus collapsible advanced filters instead of a sticky filter bar.

Admin Login

Controlled by server.adminAuth. When enabled, it protects /admin and /admin/api/* with HTTP Basic auth.

Stats Notes

  • Stats are in-memory only; restart resets counters
  • usage is collected from non-stream JSON responses and streaming SSE usage events
  • Cached token fields from upstream are counted when present
  • The proxy preserves stream_options for streaming chat/completions and responses requests, and strips it for other routes where Foundry v1 may reject it

Log Notes

  • Admin logs are stored in an in-memory ring buffer; restart clears them
  • Default retention is the most recent 1000 entries and can be tuned with ADMIN_LOG_BUFFER_SIZE
  • Log records are sanitized for common sensitive keys and large strings are truncated before entering the admin buffer
  • The admin page is intended for recent troubleshooting, not long-term audit retention

Docker

Build:

  • docker build -t aoai-proxy:latest .

Run with Azure Files-style local persistence:

  • docker run --rm -p 3000:3000 -p 443:443 -v $(pwd)/data:/app/data aoai-proxy:latest

Run with Blob-backed config persistence:

docker run --rm -p 3000:3000 -p 443:443 \
  -e PERSISTENCE_MODE=blob \
  -e AZURE_STORAGE_ACCOUNT_URL=https://<storage>.blob.core.windows.net \
  -e CONFIG_BLOB_CONTAINER=aoai-proxy-config \
  -e CONFIG_BLOB_NAME=config/config.json \
  aoai-proxy:latest

The container still uses DefaultAzureCredential, so provide service principal credentials for local development or a managed identity in Azure.

Upstream Auth Modes

servicePrincipal

  • Default mode
  • Uses a client secret when tenantId, clientId, and clientSecret are provided
  • Otherwise falls back to DefaultAzureCredential, including managed identity when available
  • Requires auth.scope

apiKey

  • Sends requests to Azure AI Foundry / Azure OpenAI with the api-key header
  • Requires auth.apiKey
  • Does not acquire AAD tokens or use auth.scope

Azure Deployment

Deploy with Bicep

az deployment group create \
  --resource-group <rg> \
  --template-file infra/main.bicep \
  --parameters @infra/parameters/dev.json

Deploy with ARM JSON

az deployment group create \
  --resource-group <rg> \
  --template-file infra/azuredeploy.json \
  --parameters @infra/parameters/prod.json

The templates provision:

  • A container group with system-assigned managed identity
  • A new storage account
  • Azure Files share when persistenceMode=azureFile
  • Blob container when persistenceMode=blob
  • RBAC assignment for Storage Blob Data Contributor on the blob container for both the container identity and the current deployment principal when blob mode is enabled
  • RBAC assignment for Cognitive Services OpenAI User on the target Azure OpenAI resource

The target Azure OpenAI / Foundry resource can live in a different resource group within the same subscription. Set cognitiveServicesAccountResourceGroup when it differs from the deployment resource group. The storageAccountName parameter is the name of a new storage account to create. The current templates do not support selecting or reusing an existing storage account.

Deploy as Azure Managed Application With Custom UI

Use infra/azure_deployment_with_UI when you want the Azure Portal to use createUiDefinition.json and show the richer resource-selection UI.

Package the files so that mainTemplate.json and createUiDefinition.json are at the root of the zip:

cd infra/azure_deployment_with_UI
zip -j app.zip mainTemplate.json createUiDefinition.json

Publish a service catalog definition with Azure CLI using either the local files or an uploaded package URI. Example with local files:

az managedapp definition create \
  --resource-group <definition-rg> \
  --name aoai-proxy-managedapp \
  --location <location> \
  --display-name "AOAI Foundry Proxy" \
  --description "AOAI Foundry Proxy with custom UI for ACI deployment" \
  --lock-level ReadOnly \
  --authorizations <principalId>:<roleDefinitionId> \
  --create-ui-definition @infra/azure_deployment_with_UI/createUiDefinition.json \
  --main-template @infra/azure_deployment_with_UI/mainTemplate.json

Retrieve the definition ID:

az managedapp definition show \
  --resource-group <definition-rg> \
  --name aoai-proxy-managedapp \
  --query id -o tsv

Deploy a service catalog instance:

az managedapp create \
  --resource-group <application-rg> \
  --name aoai-proxy-instance \
  --location <location> \
  --kind ServiceCatalog \
  --managed-rg-id /subscriptions/<subscription-id>/resourceGroups/<managed-rg-name> \
  --managedapp-definition-id <definition-id>

The portal UI in this package supports selecting an existing Foundry or Azure OpenAI resource by resource picker and passes the selected resource group to the deployment template.

ACI Persistence and RBAC

Caddy TLS

Use the admin page to configure domain, email, upstream, and transport timeouts. Saving config regenerates the Caddyfile and attempts a hot reload.

When Caddy is already enabled and the container restarts, the app now treats the early boot period as starting instead of an error and retries the local caddy reload probe in the background until the Caddy process is ready.

If active health checks are enabled and /healthz is API-key protected, add a health header in Caddy or disable health_uri to avoid false 401/503 failures.

Foundry v1 Notes

  • Data plane path is /openai/v1/*
  • api-version is optional; default behavior is v1
  • Request model must be the deployment identifier

Modern Model Compatibility

For gpt-5 and newer models, plus o* reasoning models, the proxy now applies a small set of request normalizations before forwarding to Foundry:

  • max_tokens is upgraded to max_completion_tokens for chat/completions
  • top_logprobs implies logprobs: true when the client omits it
  • reasoning_effort and reasoning.effort accept low, medium, and high; xhigh is downgraded to high
  • service_tier, verbosity, and top_k are stripped for modern models because they are common sources of unknown_parameter errors against Foundry
  • web_search_preview tools are rejected early with a 400 because Azure Foundry does not currently support web search tools

The proxy also keeps stream_options for streaming chat/completions and responses requests, and strips it only for routes where Foundry v1 may reject it.

Model Route Overrides

Use models[].routes when the client-facing route and backend-supported route differ.

{
  "models": [
    {
      "id": "my-model",
      "upstream": "foundry",
      "targetModel": "my-deployment",
      "routes": {
        "chat/completions": "responses"
      }
    }
  ]
}

curl Examples

List models:

  • curl -sS http://127.0.0.1:3000/v1/models -H 'authorization: Bearer CHANGEME' | jq .

Chat request:

  • curl -sS http://127.0.0.1:3000/v1/chat/completions -H 'content-type: application/json' -H 'authorization: Bearer CHANGEME' -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"ping"}]}' | jq .

About

A reserve proxy using Managed Entity/ Service Principle for Azure AI Foundry

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors