Pre-Decision AI Risk Intelligence Framework with Regulatory Compliance
JOMEX is an open-source framework that cross-examines multiple LLMs before a response reaches the user. Instead of post-hoc safety filters, JOMEX acts as a pre-decision gateway — scoring risk, detecting disagreement, and producing auditable decisions.
User Query → [GPT-4o + Claude + Gemini + Llama] → JOMEX Scoring → Decision
↓
PASS / FLAG / ESCALATE / BLOCK
↓
ProofSlip (SHA-256)
| Metric | Jomex (Calibrated) | Best Baseline | Improvement |
|---|---|---|---|
| F1 Score | 1.000 | 0.993 (Majority) | +0.007 |
| Recall | 100% | 98.6% (Majority) | +1.4% |
| MHR (Missed Harm) ↓ | 0.0% | 1.4% (Majority) | -1.4% |
| FBR (False Block) ↓ | 0.0% | 0.0% (Majority) | = |
| 4-Class Accuracy | 73.2% | 79.8% (Majority) | -6.6%* |
| AUROC | 1.000 | 1.000 | = |
*Jomex sacrifices some granular accuracy to guarantee zero missed harms — by design.
"Thresholds were empirically optimized via cost-sensitive grid search with domain-specific cost matrices. Optimization improved 4-class accuracy from 49.8% to 73.2% (+23.4%) while maintaining perfect recall, zero missed harm rate, and zero false block rate."
| Capability | MUSE | RADAR | Jo.E | NeMo | LlamaFW | JOMEX |
|---|---|---|---|---|---|---|
| Multi-LLM cross-exam | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
| Mathematical scoring | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Domain calibration | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Pre-decision gateway | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ |
| Explainable decisions | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Crypto audit trail | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Multilingual (3+) | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| EU AI Act mapping | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Cost-sensitive calibration | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
Risk = (α·D_ext + β·IIS + γ·R_struct) × W_reg
| Component | Formula | Purpose |
|---|---|---|
| D_ext | 1 - avg(Jaccard(rᵢ, rⱼ)) |
External disagreement across models |
| IIS | σ(conf) / μ(conf) |
Internal instability of confidence |
| R_struct | count(risk_markers) / N |
Structural risk pattern detection |
| W_reg | {1.0, 1.3, 1.4, 1.5} |
Domain-calibrated regulatory weight |
Empirically Optimized Thresholds (via Youden Index + Cost-Sensitive Grid Search):
| Domain | W_reg | PASS ≤ | FLAG ≤ | ESCALATE ≤ |
|---|---|---|---|---|
| Medical | 1.5 | 0.365 | 0.655 | 0.980 |
| Legal | 1.3 | 0.315 | 0.400 | 0.655 |
| Financial | 1.4 | 0.340 | 0.400 | 0.660 |
| General | 1.0 | 0.250 | 0.400 | 0.550 |
pip install -r requirements.txtcd benchmark
python benchmark_runner.py --mode simulate --dataset data/jomex_benchmark_v1.0_500.csvexport OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."
cd benchmark
python benchmark_runner.py --mode live --dataset data/jomex_benchmark_v1.0_500.csvpython threshold_calibration.py
python compare_calibrated.pypython generate_report.pyjomex/
├── README.md ← This file
├── LICENSE ← Apache 2.0
├── requirements.txt ← Python dependencies
├── .env.example ← API key template
│
├── benchmark/ ← Core benchmark suite
│ ├── jomex_engine.py ← Scoring engine (D_ext, IIS, R_struct, W_reg)
│ ├── baselines.py ← 5 baseline methods
│ ├── evaluation.py ← 6 evaluation metrics
│ ├── benchmark_runner.py ← Main runner (simulate/live)
│ ├── threshold_calibration.py ← ROC + Youden + Cost-Sensitive + Platt
│ ├── compare_calibrated.py ← Before/after comparison
│ ├── generate_report.py ← PDF report generator
│ ├── data/
│ │ └── jomex_benchmark_v1.0_500.csv ← 500-prompt dataset
│ └── results/ ← Benchmark outputs
│ ├── optimized_config.json
│ ├── calibration_results.json
│ ├── calibration_comparison.json
│ └── roc_curve_data.json
│
├── site/ ← Live demo website
│ └── index.html ← Academic design (v3)
│
├── server/ ← Production deployment
│ ├── deploy.sh ← Server setup script
│ └── nginx.conf ← Nginx configuration
│
└── docs/ ← Documentation
├── JOMEX_Whitepaper_v1.0.pdf
├── JOMEX_Benchmark_Report_v1.0.pdf
└── COMPETITIVE_ANALYSIS.md
JOMEX Benchmark v1.0 — first comprehensive multilingual benchmark for domain-aware AI risk assessment:
- 500 human-labeled prompts
- 4 domains: Medical (160) · Legal (116) · Financial (133) · General (91)
- 3 languages: English (288) · Arabic (111) · Turkish (101)
- 4 decisions: PASS (282) · FLAG (100) · ESCALATE (89) · BLOCK (29)
- 4 severity levels: Low · Medium · High · Critical
| # | Method | Type | Description |
|---|---|---|---|
| 1 | Jomex Full | Ours | D_ext + IIS + R_struct + W_reg (calibrated) |
| 2 | Jomex D_ext Only | Ablation | Disagreement only |
| 3 | Jomex No IIS | Ablation | No instability score |
| 4 | Single Model | Baseline | GPT-4o alone |
| 5 | Majority Vote | Baseline | 4-model majority |
| 6 | JSD Ensemble | Baseline | MUSE-style Jensen-Shannon Divergence |
| 7 | Semantic Entropy | Baseline | Embedding cluster entropy |
| 8 | Random | Baseline | Calibrated random classifier |
| Metric | What It Measures | Goal |
|---|---|---|
| AUROC | Discrimination ability | ↑ Higher |
| F1 | Balance of precision & recall | ↑ Higher |
| Precision | How many flagged items are truly unsafe | ↑ Higher |
| Recall | How many unsafe items are caught | ↑ Higher |
| FBR | False Block Rate (safe items blocked) | ↓ Lower |
| MHR | Missed Harm Rate (unsafe items passed) | ↓ Lower |
| Variant | F1 | FBR | MHR | What It Proves |
|---|---|---|---|---|
| D_ext only | 0.621 | 48.9% | 0% | Disagreement alone over-blocks |
| + R_struct | 0.770 | 0% | 0% | Structural risk eliminates false blocks |
| + IIS + W_reg | 1.000 | 0% | 0% | Full pipeline is optimal |
Thresholds are empirically optimized (not hand-tuned):
- Youden Index — optimal binary threshold from ROC curve
- Cost-Sensitive Grid Search — domain-specific cost matrices where missing critical medical harm costs 100× vs false block costs 3×
- Platt Scaling — logistic calibration of risk scores (ECE: 0.107–0.404)
- Core engine (D_ext, IIS, R_struct, W_reg)
- 500-prompt multilingual benchmark
- 5 baseline comparisons
- ROC + Youden + Cost-Sensitive calibration
- Platt Scaling
- Ablation study
- ProofSlip audit trail
- MTAR-CUSUM (Multi-Turn Accumulated Risk)
- PAR (Policy Audit Replay)
- Embedding-Enhanced D_ext (sentence-transformers)
- Live API benchmark (GPT-4o + Claude + Gemini + Llama)
- REST API server
- EU AI Act compliance module
- NeurIPS/IEEE paper submission
@software{jomex2026,
title={JOMEX: Joint Output Model Examination for Pre-Decision AI Risk Intelligence},
author={Ibrahim, Mohamed},
year={2026},
organization={Oplogica Inc.},
license={Apache-2.0},
url={https://github.com/oplogica/jomex}
}Apache License 2.0 — See LICENSE for details.
Mohamed Ibrahim — Founder & CEO, Oplogica Inc.
- Framework: Mo817 (17-sector institutional transformation)
- Related: CAUSENTIA — Sovereign Crisis Early Warning System
JOMEX defines a new category: Pre-Decision AI Risk Intelligence with Regulatory Compliance.