Skip to content

oplogica/jomex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jomex

Pre-Decision AI Risk Intelligence

Intervene Before Impact.

License Python


Jomex is an open-source framework that detects unsafe AI outputs before they reach end users — without requiring ground truth data.

It works by querying multiple LLMs, measuring their disagreement and internal instability, and making real-time Block / Escalate / Flag / Pass decisions with regulatory-aligned risk weights and tamper-evident audit trails.

Why Jomex?

Existing tools measure uncertainty after generation. Jomex is different:

Feature UQLM MUSE Jomex
Multi-model disagreement Partial Yes Yes
Single-model instability No No Yes (IIS)
Reasoning divergence No No Yes
Pre-decision blocking No No Yes
Regulatory-aligned weights No No Yes (EU AI Act)
Audit trail (ProofSlip) No No Yes
Multi-turn risk (MTAR) No No Yes

Quick Start

pip install jomex
import asyncio
from jomex import JomexEngine
from jomex.adapters import OpenAIAdapter, AnthropicAdapter, OllamaAdapter

engine = JomexEngine(
    models=[
        OpenAIAdapter("gpt-4o-mini"),
        AnthropicAdapter("claude-sonnet-4-20250514"),
        OllamaAdapter("llama3.1:8b"),  # local model for architectural diversity
    ],
)

async def check():
    result = await engine.evaluate(
        "Is ibuprofen safe during pregnancy?",
        domain="medical",
    )
    print(f"Decision: {result.decision}")   # BLOCK / ESCALATE / FLAG / PASS
    print(f"Risk: {result.risk_score:.3f}")
    print(f"D_ext: {result.d_ext:.3f}")
    print(f"IIS: {result.iis:.3f}")
    print(f"R_struct: {result.r_struct:.3f}")
    print(f"W_reg: {result.w_reg}")

asyncio.run(check())

Using Risk Profiles (Recommended)

Instead of tuning weights manually, use pre-configured profiles for your domain:

# Safe defaults for medical applications — stricter thresholds, more paraphrases
engine = JomexEngine.with_profile(
    models=[...],
    profile_name="medical",
)
Profile Domain t_block Framework Notes
medical Healthcare, clinical 0.60 EU AI Act + FDA 5 IIS paraphrases, conservative
legal Legal reasoning 0.65 EU AI Act Higher reasoning weight (β=0.3)
financial Finance, credit 0.70 Basel III Higher disagreement weight (α=0.5)
education Educational content 0.75 EU AI Act Moderate thresholds
customer_service Support chatbots 0.75 EU AI Act IIS disabled for speed
default General purpose 0.70 EU AI Act Balanced defaults

How It Works

Query → [Model A] ─┐
        [Model B] ──┼→ D_ext (disagreement)
        [Model C] ──┘   IIS (instability)
                         R_struct (reasoning divergence)
                         × W_reg (regulatory weight)
                         ─────────────────────
                         = Risk(x)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
                  PASS      FLAG     BLOCK
                           + ProofSlip audit record

Risk Equation

Risk(x) = W_reg × (α · D_ext + β · R_struct + γ · IIS)
Component What it measures
D_ext How much do different models disagree? (cosine distance)
IIS How unstable is a single model across paraphrases? (Tr(Cov))
R_struct How different are the reasoning paths? (embedding distance)
W_reg How critical is this domain? (EU AI Act / FDA / Basel III)

Decisions

Risk Level Action Description
< t_flag PASS Output delivered normally
t_flag — t_escalate FLAG Output delivered with risk warning
t_escalate — t_block ESCALATE Routed to human reviewer
> t_block BLOCK Output prevented entirely

Every decision generates a ProofSlip — a SHA256 chain-linked audit record.

Multi-Turn Risk (MTAR)

Jomex tracks cumulative risk across conversation turns using CUSUM (Cumulative Sum Control Chart):

result = await engine.evaluate(
    "What drug interactions should I worry about?",
    domain="medical",
    session_id="patient-session-42",
)

# MTAR state is now embedded directly in the result
print(f"Cumulative risk: {result.mtar_cumulative:.3f}")
print(f"Turn count: {result.mtar_turn_count}")
print(f"Alert triggered: {result.mtar_alert}")

# Also available via engine for full history
mtar = engine.get_mtar_state("patient-session-42")
print(f"Risk history: {mtar.history}")

MTAR automatically overrides individual PASS/FLAG decisions to ESCALATE when cumulative risk crosses the alert threshold — even if the current turn looks safe on its own.

Proof-of-Alignment Replay (PAR)

PAR lets you replay historical decisions under new configurations — answering "what if we changed our risk policy?" without exposing users to risk.

from jomex import ReplayEngine, ReplayConfig

replay = ReplayEngine()

# What if we applied stricter thresholds?
summary = await replay.run(
    slips=engine.proof_gen.get_slips(),
    config=ReplayConfig(
        name="stricter_medical_policy",
        t_block=0.5,       # Was 0.7
        t_escalate=0.35,   # Was 0.5
    ),
)

print(f"Decisions changed: {summary.decisions_changed}/{summary.total_slips}")
print(f"Change rate: {summary.change_rate:.1%}")
print(f"Avg risk drift: {summary.avg_risk_drift:+.4f}")
print(f"Migration: {summary.migration}")
# e.g., {"PASS": {"FLAG": 12, "ESCALATE": 3}, "FLAG": {"BLOCK": 5}}

PAR never modifies the original ProofSlip chain — it operates on a read-only copy.

Regulatory Frameworks

Built-in support for:

  • EU AI Act (default) — Unacceptable / High / Limited / Minimal
  • FDA — Class III / II / I / Exempt
  • Basel III — Systemic / Material / Standard / Low

Creative domains (writing, poetry, brainstorming) automatically get W_reg = 0, disabling risk assessment where it doesn't apply.

REST API Server

Deploy Jomex as an HTTP gateway in front of any LLM:

pip install fastapi uvicorn
uvicorn jomex.server:app --host 0.0.0.0 --port 8000
Endpoint Method Description
/evaluate POST Evaluate a query for risk
/audit/verify GET Verify ProofSlip chain integrity
/audit/slips GET List ProofSlips (filterable by decision)
/replay POST Run PAR scenario analysis
/profiles GET List available risk profiles
/health GET Health check
# Custom server with real models
from jomex.server import create_app
from jomex.adapters import OpenAIAdapter, AnthropicAdapter

app = create_app(
    models=[OpenAIAdapter("gpt-4o-mini"), AnthropicAdapter("claude-sonnet-4-20250514")],
    profile="medical",
    proof_storage="audit/slips.jsonl",
)

Multilingual Support

By default, Jomex uses English embeddings. For multilingual deployments:

from jomex import set_default_embedding_model

# Switch to multilingual embeddings (Arabic, Turkish, etc.)
set_default_embedding_model("paraphrase-multilingual-MiniLM-L12-v2")

Configuration

engine = JomexEngine(
    models=[...],

    # Risk component weights (must sum to ~1.0)
    alpha=0.5,        # Weight for D_ext (disagreement)
    beta=0.2,         # Weight for R_struct (reasoning divergence)
    gamma=0.3,        # Weight for IIS (instability)

    # Decision thresholds
    t_flag=0.3,       # Above this: FLAG
    t_escalate=0.5,   # Above this: ESCALATE
    t_block=0.7,      # Above this: BLOCK

    # Optional toggles
    iis_enabled=True,          # Disable IIS to reduce latency
    r_struct_enabled=True,     # Disable R_struct for simpler assessment
    iis_model_index=0,         # Which model to test for internal stability
    iis_paraphrases=3,         # Number of paraphrases for IIS measurement
)

Audit Trail

Every evaluation generates a ProofSlip — a tamper-evident, SHA256 chain-linked audit record. Each slip references the hash of the previous slip, creating a verifiable sequential chain.

# Verify chain integrity
assert engine.proof_gen.verify_chain()

# Get all blocked decisions
blocked = engine.proof_gen.get_slips(decision_filter="BLOCK")

# Persist to file (JSON Lines, append-only)
engine = JomexEngine(
    models=[...],
    proof_generator=ProofSlipGenerator(storage_path="audit/jomex_slips.jsonl"),
)

The ProofSlip generator is concurrency-safe — multiple async evaluations can run simultaneously without chain corruption.

Architecture

jomex/
├── engine.py          # JomexEngine orchestrator (async, parallel, with_profile)
├── scorers.py         # D_ext, IIS, R_struct (multilingual embedding support)
├── regulatory.py      # W_reg mapper (EU AI Act, FDA, Basel III)
├── proof_slip.py      # ProofSlip generator (async, lock-protected hash chain)
├── profiles.py        # Pre-configured risk profiles per domain
├── replay.py          # PAR — Proof-of-Alignment Replay engine
├── server.py          # FastAPI REST gateway
├── adapters.py        # LLM provider adapters
└── models.py          # Decision, RiskResult, MTARState

Supported LLM Providers

Provider Adapter Install
OpenAI OpenAIAdapter("gpt-4o-mini") pip install jomex[openai]
Anthropic AnthropicAdapter("claude-sonnet-4-20250514") pip install jomex[anthropic]
Google GoogleAdapter("gemini-2.0-flash") pip install jomex[google]
Ollama (local) OllamaAdapter("llama3.1:8b") Requires Ollama
Testing MockAdapter("test", ["response"]) Built-in

Testing

# 140 tests across 18 sections, 0 failures
# Includes: unit, property, integration, concurrency (50 parallel),
# monotonicity invariants, tamper detection, and PAR replay validation
python tests/test_all.py

Disclaimer

Jomex is a risk assessment tool, not a decision-maker. It does NOT provide medical diagnoses, legal opinions, or financial advice. A "PASS" decision does NOT guarantee output safety. Human oversight is required for all high-risk deployments. See LICENSE for full liability disclaimers.

License

Apache 2.0 — Oplogica Inc.

Links