LLM Billing Proxy

HTTP proxy that sits between end users and LLM servers and bills them per token.

Requirements

Python >=3.11
Python packages from pyproject.toml (dependencies)

Quick start

python3 -m venv venv
source venv/bin/activate

sqlite3 db.sqlite < llmproxy/schema.sql

python3 -m llmproxy

Configuration

See llmproxy/config.toml for an example configuration file. The program will update the list of configured backends from the config file on SIGHUP.

pkill -f -HUP 'python3? .*llmproxy'

# or in Docker
docker kill -s=SIGHUP CONTAINER

Testing

python3 -m unittest

Building the image

To build the Docker image, run the following command:

docker build -t ghcr.io/comtegra/llmproxy:master .

Deployment

The following instructions assume that you already have a Docker compose repository and file.

SQLite

Create an SQLite database according to llmproxy/schema.sql.

sqlite3 db.sqlite < llmproxy/schema.sql

Copy llmproxy/config.toml to your repository. A good relative path would be secrets/llm-billing-proxy-config.toml.
Add appropriate entries to your compose file's services and secrets sections. There's a template compose.yml in this repository.
Bring the new service up and verify it started correctly (e.g docker compose up llm-billing-proxy).

MongoDB

Create a MongoDB user with the following privileges (see section Database below for JS snippets):

{resource: {db: "cgc", collection: "api_keys"}, actions: ["find"]}
{resource: {db: "billing", collection: "events_oneoff"}, actions: ["insert"]}

Copy llmproxy/config.toml to your repository. A good relative path would be secrets/llm-billing-proxy-config.toml.
In the config edit uri in section db so that it includes credentials for the Mongo user (e.g. mongodb://myuser:mypass@host:27017/?authSource=cgc).
Add appropriate entries to your compose file's services and secrets sections. There's a template compose.yml in this repository.
Bring the new service up and verify it started correctly (e.g docker compose up llm-billing-proxy).

Testing

Sample data can be added to an SQLite database by using sample.sql. Ditto for MongoDB in sample.js.

After loading sample data you may run a query like this:

curl -v -H'Content-Type: application/json' -H'Authorization: Bearer token2' \
    -d'{"messages": [{"role": "system", "content": "You are an assistant."}, {"role": "user", "content": "Write a limerick about python exceptions"}], "model": "llama31-70b", "stream": true}' \
    'http://localhost:8080/v1/chat/completions'

Database

If using SQLite see llmproxy/schema.sql. Otherwise read on.

This program uses MongoDB for authentication and completion logging. The schema is compatible with Comtergra GPU Core. The MongoDB user needs the following privileges:

db.getSiblingDB("cgc").createRole({
  role: "apiKeysReader",
  privileges: [
    {
      resource: {db: "cgc", collection: "api_keys"},
      actions: ["find"],
    },
  ],
  roles: [],
});
db.getSiblingDB("billing").createRole({
  role: "completionBillingWriter",
  privileges: [
    {
      resource: {db: "billing", collection: "events_oneoff"},
      actions: ["insert"],
    },
  ],
  roles: [],
});

db.getSiblingDB("cgc").createUser({
  user: "llm-billing-proxy",
  pwd: "mypass",
  roles: [
    { role: "completionBillingWriter", db: "cgc" },
    { role: "apiKeysReader", db: "billing" },
  ],
});

Authentication

Users authenticate via bearer tokens. Tokens are stored in the database as SHA256 hashes. They can be generated by the following command:

python3 -c 'import hashlib, secrets; print("Token:", t:=secrets.token_urlsafe(64)); print("Hash:", hashlib.sha256(t.encode()).hexdigest())'

If using SQLite see llmproxy/schema.sql. Otherwise read on.

When a user attempts to authenticate, the following query is performed to the cgc.api_keys collection.

{
    "access_level": "LLM",
    "secret": TOKEN-HASH,
    "$or": [
        {"date_expiry": {"$gt": new Date()}},
        {"date_expiry": null},
    ],
}

TOKEN-HASH is replaced with the SHA256 hash of the bearer token.

The documents are required to have an additional field: user_id.

Completion logging

If using SQLite see llmproxy/schema.sql. Otherwise read on.

When a user performs a completion, two events (prompt token and completion token counts) are inserted into the collection billing.events_oneoff. These documents have the following fields:

date_created -- date and time when the request finished processing
user_id -- user_id of the API key
api_key_id -- id of the API key
product -- a string in the following format: MODEL/DEVICE/TYPE, where
- MODEL is the name of the backend
- DEVICE is the name of the GPU where the model runs
- TYPE is prompt or completion
quantity -- token count
request_id -- request ID to correlate prompt/completion counts

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
llmproxy		llmproxy
scripts		scripts
tests		tests
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml
sample.js		sample.js
sample.sql		sample.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Billing Proxy

Requirements

Quick start

Configuration

Testing

Building the image

Deployment

SQLite

MongoDB

Testing

Database

Authentication

Completion logging

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

Comtegra/llmproxy

Folders and files

Latest commit

History

Repository files navigation

LLM Billing Proxy

Requirements

Quick start

Configuration

Testing

Building the image

Deployment

SQLite

MongoDB

Testing

Database

Authentication

Completion logging

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages