Open-source AI safety benchmark testing how models handle tricky gray-zone requests. CLI for running benchmarks + web dashboard for exploring results.
open-source benchmark dashboard openai ai-safety cli-tool ai-research llm-evaluation safety-evaluation harmfulness-detection safe-completion
-
Updated
Aug 20, 2025 - Python