Skip to content

JarvisPei/Behavioral-Fingerprinting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Behavioral Fingerprinting of Large Language Models

A reproducible framework to build multi-dimensional "behavioral fingerprints" of LLMs using a diagnostic prompt suite and an automated evaluator. The pipeline collects model responses, scores them against detailed rubrics via a separate evaluator model, and generates visual summaries.

Highlights

  • Diagnostic Prompt Suite across reasoning, world model, bias, personality, and robustness
  • Automated evaluation using a strong LLM as an impartial judge (JSON outputs)
  • Visualizations: radar profiles and category comparison charts
  • Narrative reports summarizing each model's qualitative fingerprint
  • Fully file-based artifacts checked into the repo (results/, evaluations/, charts/, reports/)

Repository structure

  • src/ — scripts to run the end-to-end pipeline
    • run_experiment.py — parse prompts and collect model responses into results/
    • run_evaluation.py — construct meta-prompts with rubrics and score into evaluations/
    • visualize_results.py — aggregate scores, generate charts in charts/, and write per-model reports in reports/
    • requirements.txt — Python dependencies
  • AI-comm-records/ — LaTeX records of the prompt suite and evaluation protocol, plus cached prompts.json
    • prompt_suite.tex, evaluation_protocol.tex, idea.tex, prompts.json
  • results/ — raw model responses (per model directory, per prompt .txt)
  • evaluations/ — evaluator JSON outputs mirroring results/ prompt IDs
  • charts/ — to be generated figures (radar and comparisons)
  • reports/ — generated narrative reports (one per model)

Installation

  1. Python 3.10+
  2. Create a virtual environment and install dependencies:
python -m venv .venv && source .venv/bin/activate
pip install -r src/requirements.txt
  1. Configure environment for OpenRouter (used for both target models and evaluator):
  • Create a .env file at the repo root with:
OPENROUTER_API_KEY=your_key_here

Note: If no key is present, scripts run in simulation mode and still write placeholder outputs so the pipeline can be exercised end-to-end.

Usage

1) Collect model responses

Edit TARGET_MODELS in src/run_experiment.py to include the OpenRouter identifiers you wish to evaluate, then run:

python src/run_experiment.py
  • Prompts are read from AI-comm-records/prompts.json (cached) or parsed from AI-comm-records/prompt_suite.tex on first run.
  • Outputs are written per model into results/<provider>/<model>/<prompt_id>.txt or results/<model_id>/<prompt_id>.txt depending on your choice of naming. The current repo uses flat model IDs like results/openai/gpt-5/.

2) Score responses with evaluator

Set the TARGET_MODELS list in src/run_evaluation.py to match the result folders you want scored. Optionally set EVALUATOR_MODEL.

python src/run_evaluation.py
  • Produces JSON files in evaluations/<provider>/<model>/<prompt_id>.json.
  • Robustness pairs (e.g., 4.1.1A/B) are evaluated jointly and saved as 4.1.1.json, 4.1.2.json.

3) Aggregate, visualize, and report

python src/visualize_results.py
  • Aggregates numeric scores, normalizes by category maxima, and emits:
    • Radar charts per model in charts/ (e.g., gpt-5_radar.png)
    • Comparison bar charts per category in charts/large/ or charts/mid/
    • Narrative reports per model in reports/ (e.g., gpt-5_report.txt)

Example artifacts

  • Radar: charts/gpt-5_radar.png, charts/gemini-2.5-pro_radar.png
  • Comparisons: charts/large/Robustness_comparison.png or charts/mid/Causal_Chain_comparison.png
  • Reports: reports/gpt-5_report.txt, reports/claude-opus-4.1_report.txt

Prompt suite and evaluation protocol

  • Prompts defined in AI-comm-records/prompt_suite.tex (cached JSON in AI-comm-records/prompts.json).
  • Rubrics and procedures in AI-comm-records/evaluation_protocol.tex.
  • Research narrative and scoping in AI-comm-records/idea.tex, discussion_points.tex, and literature_review.tex.

Notes and tips

  • Model identifiers: scripts assume OpenRouter-style IDs (e.g., openai/gpt-5). Adjust paths or names consistently if you change the layout.
  • Simulation mode: without an API key, the system writes placeholder responses/evaluations so you can test downstream steps.
  • Personality classification prompts (3.3.x) yield non-numeric scores (e.g., E/I/S/N). Visualization code treats these separately and excludes them from numeric averages.

Cite

If you found this work useful, please consider citing:

@article{pei2025behavioral,
  title={Behavioral Fingerprinting of Large Language Models},
  author={Pei, Zehua and Zhen, Hui-Ling and Zhang, Ying and Yang,  Zhiyuan and Li, Xing and Yu, Xianzhi and Yuan, Mingxuan and Yu, Bei},
  journal={arXiv preprint arXiv:2509.04504},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages