🚀 AI Judge Preference Demo

What if AI models were judging your performance review or resume? This system reveals the hidden biases and preferences of AI judges by running competitive tournaments between different writing styles and optimization strategies.

⚠️ Research Tool: This explores how AI models evaluate professional content, not career advice. It shows which specific wording, metrics, and presentation styles make AI judges rank one version higher than another.

Data you provide is processed by OpenAI models but not stored by this demo
The system uses competitive ELO tournaments to identify optimization strategies that AI judges favor
This explores AI bias patterns rather than providing definitive career advice
The demo takes 5-15 minutes to run depending on settings and uses a variety of strategies and models including GPT-5
View the code on GitHub

Features

Multi-Strategy Optimization: 6 different AI agents with specialized optimization strategies
ELO Tournament System: Competitive ranking to find the best optimized version
Truth Agent: Prevents fabrications by verifying against ground truth data
Streamlit Web Interface: Easy-to-use web app for running optimizations
Multi-Judge System: Optional consensus-based judging with multiple models
Configurable Strategies: Fully customizable agent strategies and focus areas

Usage

Prerequisites

Python 3.8+
OpenAI API key (set as OPENAI_API_KEY environment variable)

Installation & Setup

# Clone the repository
git clone https://github.com/sshh12/perf-review
cd perf-review

# Install dependencies
pip install -r requirements.txt

# Set up environment (includes OpenAI API key)
source env.sh

Running the Web Interface

# Start the Streamlit app
source env.sh && streamlit run streamlit_app.py

Then open your browser to the provided URL (typically http://localhost:8501)

The web interface provides two modes:

📝 Performance Review Mode: Optimize performance reviews against career ladders
💼 Resume/Recruiting Mode: Optimize resumes against job requirements

Running Command Line Version

# Set up environment and run
source env.sh && PYTHONPATH=. python main.py --rounds 2 --output results.json

# Create sample configuration
source env.sh && PYTHONPATH=. python main.py --create-sample-config

# Format code (development)
black .

How It Works

Input: Provide your original content, evaluation criteria, and optional background data
Optimization: AI agents create optimized versions using different strategies and approaches
Tournament: Optimized versions compete in ELO-ranked matches judged by multiple AI models
Results: Get ranked leaderboard showing which optimization approaches AI judges prefer

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
perf_review		perf_review
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
main.py		main.py
railway.toml		railway.toml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 AI Judge Preference Demo

Features

Usage

Prerequisites

Installation & Setup

Running the Web Interface

Running Command Line Version

How It Works

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 AI Judge Preference Demo

Features

Usage

Prerequisites

Installation & Setup

Running the Web Interface

Running Command Line Version

How It Works

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages