SecureFlow Benchmark is a modular LLM safety evaluation framework built for red-teaming language models. It uses plugin-based attack probes, prompt transformation buffs, and automated safety detectors to systematically find vulnerabilities in LLM deployments.
graph LR
A[Attack Probes] --> B[Prompt Buffs]
B --> C[LLM Generator]
C --> D[Response]
D --> E[Safety Detectors]
E --> F[Evaluator]
F --> G[Safety Report]
- Plugin-based architecture with 4 extension points: probes, generators, detectors, and buffs
- Attack probes for jailbreaking (DAN, TAP), encoding attacks, and prompt injection
- LLM backends for OpenAI, HuggingFace, and REST APIs
- Safety detectors for content analysis and package hallucination detection
- Configurable evaluation harness with structured attempt data model
- Report generation and analysis visualization
pip install secureflow-benchmarkimport secureflow_benchmark
from secureflow_benchmark.probes import dan
from secureflow_benchmark.generators import openai as openai_gen
from secureflow_benchmark.evaluators import base as base_eval
# Set up a generator
generator = openai_gen.OpenAIGenerator(name="gpt-4o")
# Run DAN jailbreak probes
probe = dan.Dan_11_0()
attempts = probe.probe(generator)
# Evaluate results
evaluator = base_eval.Evaluator()
results = evaluator.evaluate(attempts)
print(f"Pass rate: {results.pass_rate:.2%}")secureflow_benchmark/
├── __init__.py # Package init and version
├── _config.py # Configuration management
├── _plugins.py # Plugin discovery system
├── configurable.py # Base configurable class
├── attempt.py # Attempt data model
├── cli.py # Command-line interface
├── command.py # Run orchestration commands
├── report.py # Report generation
├── payloads.py # Payload management
├── probes/ # Attack probes
│ ├── base.py # Base probe class
│ ├── dan.py # DAN jailbreak probes
│ ├── tap.py # TAP jailbreak probes
│ ├── encoding.py # Encoding attack probes
│ ├── latentinjection.py # Latent injection probes
│ ├── promptinject.py # Prompt injection probes
│ ├── continuation.py # Continuation probes
│ ├── grandma.py # Grandma exploit probes
│ └── lmrc.py # LMRC benchmark probes
├── generators/ # LLM backends
│ ├── base.py # Base generator class
│ ├── openai.py # OpenAI API backend
│ ├── huggingface.py # HuggingFace backend
│ └── rest.py # Generic REST API backend
├── detectors/ # Safety detectors
│ ├── base.py # Base detector class
│ ├── unsafe_content.py # Unsafe content detection
│ ├── packagehallucination.py # Package hallucination
│ ├── mitigation.py # Mitigation detection
│ └── always.py # Always-pass/fail detectors
├── buffs/ # Prompt transformations
│ ├── base.py # Base buff class
│ ├── encoding.py # Encoding transformations
│ ├── lowercase.py # Lowercase transformation
│ ├── paraphrase.py # Paraphrase transformation
│ └── low_resource_languages.py # Language transforms
├── harnesses/ # Evaluation harnesses
├── evaluators/ # Result evaluators
└── analyze/ # Report analysis tools
Building this framework deepened my understanding of LLM red-teaming methodology — specifically the taxonomy of jailbreak attacks (DAN prompts, encoding-based bypasses, latent injection), how automated safety benchmarking pipelines work, and the design patterns needed to make such a system extensible across different model backends and attack strategies.
Built upon garak by NVIDIA (Apache 2.0 License). SecureFlow Benchmark is a focused subset and fork of garak, adapted for benchmarking and educational use.
Apache 2.0 — See LICENSE for details.