SecureFlow Benchmark — LLM Safety Evaluation & Red-Teaming Framework

SecureFlow Benchmark is a modular LLM safety evaluation framework built for red-teaming language models. It uses plugin-based attack probes, prompt transformation buffs, and automated safety detectors to systematically find vulnerabilities in LLM deployments.

Architecture

graph LR
    A[Attack Probes] --> B[Prompt Buffs]
    B --> C[LLM Generator]
    C --> D[Response]
    D --> E[Safety Detectors]
    E --> F[Evaluator]
    F --> G[Safety Report]

Features

Plugin-based architecture with 4 extension points: probes, generators, detectors, and buffs
Attack probes for jailbreaking (DAN, TAP), encoding attacks, and prompt injection
LLM backends for OpenAI, HuggingFace, and REST APIs
Safety detectors for content analysis and package hallucination detection
Configurable evaluation harness with structured attempt data model
Report generation and analysis visualization

Quick Start

pip install secureflow-benchmark

Usage

import secureflow_benchmark
from secureflow_benchmark.probes import dan
from secureflow_benchmark.generators import openai as openai_gen
from secureflow_benchmark.evaluators import base as base_eval

# Set up a generator
generator = openai_gen.OpenAIGenerator(name="gpt-4o")

# Run DAN jailbreak probes
probe = dan.Dan_11_0()
attempts = probe.probe(generator)

# Evaluate results
evaluator = base_eval.Evaluator()
results = evaluator.evaluate(attempts)
print(f"Pass rate: {results.pass_rate:.2%}")

Project Structure

secureflow_benchmark/
├── __init__.py           # Package init and version
├── _config.py            # Configuration management
├── _plugins.py           # Plugin discovery system
├── configurable.py       # Base configurable class
├── attempt.py            # Attempt data model
├── cli.py                # Command-line interface
├── command.py            # Run orchestration commands
├── report.py             # Report generation
├── payloads.py           # Payload management
├── probes/               # Attack probes
│   ├── base.py           # Base probe class
│   ├── dan.py            # DAN jailbreak probes
│   ├── tap.py            # TAP jailbreak probes
│   ├── encoding.py       # Encoding attack probes
│   ├── latentinjection.py # Latent injection probes
│   ├── promptinject.py   # Prompt injection probes
│   ├── continuation.py   # Continuation probes
│   ├── grandma.py        # Grandma exploit probes
│   └── lmrc.py           # LMRC benchmark probes
├── generators/           # LLM backends
│   ├── base.py           # Base generator class
│   ├── openai.py         # OpenAI API backend
│   ├── huggingface.py    # HuggingFace backend
│   └── rest.py           # Generic REST API backend
├── detectors/            # Safety detectors
│   ├── base.py           # Base detector class
│   ├── unsafe_content.py # Unsafe content detection
│   ├── packagehallucination.py # Package hallucination
│   ├── mitigation.py     # Mitigation detection
│   └── always.py         # Always-pass/fail detectors
├── buffs/                # Prompt transformations
│   ├── base.py           # Base buff class
│   ├── encoding.py       # Encoding transformations
│   ├── lowercase.py      # Lowercase transformation
│   ├── paraphrase.py     # Paraphrase transformation
│   └── low_resource_languages.py # Language transforms
├── harnesses/            # Evaluation harnesses
├── evaluators/           # Result evaluators
└── analyze/              # Report analysis tools

What I Learned

Building this framework deepened my understanding of LLM red-teaming methodology — specifically the taxonomy of jailbreak attacks (DAN prompts, encoding-based bypasses, latent injection), how automated safety benchmarking pipelines work, and the design patterns needed to make such a system extensible across different model backends and attack strategies.

Credit

Built upon garak by NVIDIA (Apache 2.0 License). SecureFlow Benchmark is a focused subset and fork of garak, adapted for benchmarking and educational use.

License

Apache 2.0 — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
secureflow_benchmark		secureflow_benchmark
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecureFlow Benchmark — LLM Safety Evaluation & Red-Teaming Framework

Architecture

Features

Quick Start

Usage

Project Structure

What I Learned

Credit

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SecureFlow Benchmark — LLM Safety Evaluation & Red-Teaming Framework

Architecture

Features

Quick Start

Usage

Project Structure

What I Learned

Credit

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages