JOMEX — Joint Output Model Examination

Pre-Decision AI Risk Intelligence Framework with Regulatory Compliance

What is JOMEX?

JOMEX is an open-source framework that cross-examines multiple LLMs before a response reaches the user. Instead of post-hoc safety filters, JOMEX acts as a pre-decision gateway — scoring risk, detecting disagreement, and producing auditable decisions.

User Query → [GPT-4o + Claude + Gemini + Llama] → JOMEX Scoring → Decision
                                                      ↓
                                          PASS / FLAG / ESCALATE / BLOCK
                                                      ↓
                                              ProofSlip (SHA-256)

Key Results

Metric	Jomex (Calibrated)	Best Baseline	Improvement
F1 Score	1.000	0.993 (Majority)	+0.007
Recall	100%	98.6% (Majority)	+1.4%
MHR (Missed Harm) ↓	0.0%	1.4% (Majority)	-1.4%
FBR (False Block) ↓	0.0%	0.0% (Majority)	=
4-Class Accuracy	73.2%	79.8% (Majority)	-6.6%*
AUROC	1.000	1.000	=

*Jomex sacrifices some granular accuracy to guarantee zero missed harms — by design.

"Thresholds were empirically optimized via cost-sensitive grid search with domain-specific cost matrices. Optimization improved 4-class accuracy from 49.8% to 73.2% (+23.4%) while maintaining perfect recall, zero missed harm rate, and zero false block rate."

Why JOMEX is Different

Capability	MUSE	RADAR	Jo.E	NeMo	LlamaFW	JOMEX
Multi-LLM cross-exam	✓	✓	✓	✗	✗	✓
Mathematical scoring	✗	✗	✗	✗	✗	✓
Domain calibration	✗	✗	✗	✗	✗	✓
Pre-decision gateway	✗	✗	✗	✓	✓	✓
Explainable decisions	✗	✗	✗	✗	✗	✓
Crypto audit trail	✗	✗	✗	✗	✗	✓
Multilingual (3+)	✗	✗	✗	✗	✗	✓
EU AI Act mapping	✗	✗	✗	✗	✗	✓
Cost-sensitive calibration	✗	✗	✗	✗	✗	✓

Mathematical Framework

Risk = (α·D_ext + β·IIS + γ·R_struct) × W_reg

Component	Formula	Purpose
D_ext	`1 - avg(Jaccard(rᵢ, rⱼ))`	External disagreement across models
IIS	`σ(conf) / μ(conf)`	Internal instability of confidence
R_struct	`count(risk_markers) / N`	Structural risk pattern detection
W_reg	`{1.0, 1.3, 1.4, 1.5}`	Domain-calibrated regulatory weight

Empirically Optimized Thresholds (via Youden Index + Cost-Sensitive Grid Search):

Domain	W_reg	PASS ≤	FLAG ≤	ESCALATE ≤
Medical	1.5	0.365	0.655	0.980
Legal	1.3	0.315	0.400	0.655
Financial	1.4	0.340	0.400	0.660
General	1.0	0.250	0.400	0.550

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run Benchmark (Simulation Mode — No API Keys Needed)

cd benchmark
python benchmark_runner.py --mode simulate --dataset data/jomex_benchmark_v1.0_500.csv

3. Run with Real APIs (Live Mode)

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."

cd benchmark
python benchmark_runner.py --mode live --dataset data/jomex_benchmark_v1.0_500.csv

4. Calibrate Thresholds

python threshold_calibration.py
python compare_calibrated.py

5. Generate Report

python generate_report.py

Project Structure

jomex/
├── README.md                  ← This file
├── LICENSE                    ← Apache 2.0
├── requirements.txt           ← Python dependencies
├── .env.example               ← API key template
│
├── benchmark/                 ← Core benchmark suite
│   ├── jomex_engine.py        ← Scoring engine (D_ext, IIS, R_struct, W_reg)
│   ├── baselines.py           ← 5 baseline methods
│   ├── evaluation.py          ← 6 evaluation metrics
│   ├── benchmark_runner.py    ← Main runner (simulate/live)
│   ├── threshold_calibration.py ← ROC + Youden + Cost-Sensitive + Platt
│   ├── compare_calibrated.py  ← Before/after comparison
│   ├── generate_report.py     ← PDF report generator
│   ├── data/
│   │   └── jomex_benchmark_v1.0_500.csv  ← 500-prompt dataset
│   └── results/               ← Benchmark outputs
│       ├── optimized_config.json
│       ├── calibration_results.json
│       ├── calibration_comparison.json
│       └── roc_curve_data.json
│
├── site/                      ← Live demo website
│   └── index.html             ← Academic design (v3)
│
├── server/                    ← Production deployment
│   ├── deploy.sh              ← Server setup script
│   └── nginx.conf             ← Nginx configuration
│
└── docs/                      ← Documentation
    ├── JOMEX_Whitepaper_v1.0.pdf
    ├── JOMEX_Benchmark_Report_v1.0.pdf
    └── COMPETITIVE_ANALYSIS.md

Benchmark Dataset

JOMEX Benchmark v1.0 — first comprehensive multilingual benchmark for domain-aware AI risk assessment:

500 human-labeled prompts
4 domains: Medical (160) · Legal (116) · Financial (133) · General (91)
3 languages: English (288) · Arabic (111) · Turkish (101)
4 decisions: PASS (282) · FLAG (100) · ESCALATE (89) · BLOCK (29)
4 severity levels: Low · Medium · High · Critical

Evaluation Methods

#	Method	Type	Description
1	Jomex Full	Ours	D_ext + IIS + R_struct + W_reg (calibrated)
2	Jomex D_ext Only	Ablation	Disagreement only
3	Jomex No IIS	Ablation	No instability score
4	Single Model	Baseline	GPT-4o alone
5	Majority Vote	Baseline	4-model majority
6	JSD Ensemble	Baseline	MUSE-style Jensen-Shannon Divergence
7	Semantic Entropy	Baseline	Embedding cluster entropy
8	Random	Baseline	Calibrated random classifier

Metrics

Metric	What It Measures	Goal
AUROC	Discrimination ability	↑ Higher
F1	Balance of precision & recall	↑ Higher
Precision	How many flagged items are truly unsafe	↑ Higher
Recall	How many unsafe items are caught	↑ Higher
FBR	False Block Rate (safe items blocked)	↓ Lower
MHR	Missed Harm Rate (unsafe items passed)	↓ Lower

Ablation Study

Variant	F1	FBR	MHR	What It Proves
D_ext only	0.621	48.9%	0%	Disagreement alone over-blocks
+ R_struct	0.770	0%	0%	Structural risk eliminates false blocks
+ IIS + W_reg	1.000	0%	0%	Full pipeline is optimal

Calibration

Thresholds are empirically optimized (not hand-tuned):

Youden Index — optimal binary threshold from ROC curve
Cost-Sensitive Grid Search — domain-specific cost matrices where missing critical medical harm costs 100× vs false block costs 3×
Platt Scaling — logistic calibration of risk scores (ECE: 0.107–0.404)

Roadmap

Citation

@software{jomex2026,
  title={JOMEX: Joint Output Model Examination for Pre-Decision AI Risk Intelligence},
  author={Ibrahim, Mohamed},
  year={2026},
  organization={Oplogica Inc.},
  license={Apache-2.0},
  url={https://github.com/oplogica/jomex}
}

License

Apache License 2.0 — See LICENSE for details.

Author

Mohamed Ibrahim — Founder & CEO, Oplogica Inc.

Framework: Mo817 (17-sector institutional transformation)
Related: CAUSENTIA — Sovereign Crisis Early Warning System

JOMEX defines a new category: Pre-Decision AI Risk Intelligence with Regulatory Compliance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JOMEX — Joint Output Model Examination

What is JOMEX?

Key Results

Why JOMEX is Different

Mathematical Framework

Quick Start

1. Install Dependencies

2. Run Benchmark (Simulation Mode — No API Keys Needed)

3. Run with Real APIs (Live Mode)

4. Calibrate Thresholds

5. Generate Report

Project Structure

Benchmark Dataset

Evaluation Methods

Metrics

Ablation Study

Calibration

Roadmap

Citation

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
docs		docs
server		server
site		site
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

JOMEX — Joint Output Model Examination

What is JOMEX?

Key Results

Why JOMEX is Different

Mathematical Framework

Quick Start

1. Install Dependencies

2. Run Benchmark (Simulation Mode — No API Keys Needed)

3. Run with Real APIs (Live Mode)

4. Calibrate Thresholds

5. Generate Report

Project Structure

Benchmark Dataset

Evaluation Methods

Metrics

Ablation Study

Calibration

Roadmap

Citation

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages