A simple implementation of Microsoft's PyRIT (Python Risk Identification Tool) framework for conducting comprehensive red teaming attacks against AI models to provide automated adversarial testing capabilities with advanced configuration management, and result analysis.
The PyRIT Red Teaming - Sample I is a simple implementation of multi-turn prompt injection attacks, based on Microsoft's PyRIT framework and its Red Teaming Attack (Multi-Turn) functionality (https://azure.github.io/PyRIT/code/executor/attack/2_red_teaming_attack.html). Red Teaming attacks are used for testing AI models for vulnerabilities, prompt injection attacks, and adversarial responses.
In this implementation LLMs are used to execute the following red teaming atttacks roles:
- Adversarial LLM: The LLM used for generating adversarial prompts that align with the attack strategy
- Target LLM: The LLM used as the target model for red teaming attacks
- Scoring LLM: The LLM used as the model that evaluates the attack outcome i.e. if it is SUCCESS or FAILURE
Based on PyRIT design https://arxiv.org/abs/2410.02828 the below chart depicts the various components and data flows:
The Adversarial LLM generates prompts based on queries extracted from the AdvBench dataset (https://arxiv.org/abs/2307.15043). This dataset has been amended with harm types relevant to each query (https://azure.github.io/PyRIT/_autosummary/pyrit.datasets.fetch_adv_bench_dataset.html#pyrit.datasets.fetch_adv_bench_dataset)
Indicative results of the process are shown in the below table:
- Automated Red Teaming: Conduct systematic adversarial attacks against AI models
- Multi-Model Support: Test various AI models including OpenAI, Azure, and custom endpoints
- Reporting: Generate analysis reports with success rates and metrics
- Scalable Architecture: Object-oriented design for easy extension and maintenance
- Security Assessment: Identify vulnerabilities in AI systems
| Attack Type | Description | Use Case |
|---|---|---|
| Prompt Injection | Tests model's resistance to instruction manipulation | Security assessment |
| Adversarial Prompts | Uses AdvBench dataset for standardized testing | Benchmark comparisons |
| Encoding Obfuscation | Base64 and other encoding techniques | Bypass detection systems |
| Multi-Turn Attacks | Complex attack chains across multiple interactions | Advanced threat simulation |
| Component | Responsibility | Key Features |
|---|---|---|
PyRITRedTeamingSystem |
Main orchestrator | System initialization, workflow coordination |
TargetModelManager |
Model management | Multi-provider support, credential handling |
AttackExecutor |
Attack execution | Async operations, configuration management |
ResultAnalyzer |
Data analysis | Statistical analysis, export capabilities |
DatasetManager |
Dataset handling | AdvBench integration, custom dataset support |
- Python 3.8 or higher
- Microsoft PyRIT library
- API keys for target models (OpenAI, Azure, etc.)
git clone https://github.com/yourusername/pyrit-red-teaming-sample_I.git
cd pyrit-red-teaming-sample_I# Create virtual environment (recommended)
python -m venv pyrit-env
source pyrit-env\Scripts\activate (on Windows)
# Install required packages
pip install -r requirements.txtpyrit-tool>=0.4.0
pandas>=1.5.0
numpy>=1.21.0
asyncio>=3.4.3
dataclasses>=0.8
typing-extensions>=4.0.0Set up your API keys as environment variables:
# OpenAI API Key
export OPENAI_API_KEY="your_openai_api_key_here"
export OPENAI_CHAT_KEY"your_openai_chat__here"
# Azure AI Foundry (if using Azure models)
export AZURE_FOUNDRY_MODELS_COMPLETION_ENDPOINT="your_azure_endpoint"
export AZURE_FOUNDRY_API_KEY="your_azure_api_key"
# Crucible API Key
export CRUCIBLE_API_KEY="your_crucible_api_key_here"| Model | Type | Description | Status |
|---|---|---|---|
gpt-4o |
Latest GPT-4 | Most advanced OpenAI model | β Supported |
gpt-4o-mini |
Optimized GPT-4 | Faster, cost-effective variant | β Supported |
gpt-3.5-turbo |
GPT-3.5 | Legacy model for comparison | β Supported |
| Model | Type | Description | Requirements |
|---|---|---|---|
Meta-Llama-3.1-8B-Instruct |
Llama 3.1 | Meta's latest instruction model | Azure API Key |
Ministral-3B |
Mistral | Lightweight French AI model | Azure API Key |
| Endpoint | Type | Description | Access |
|---|---|---|---|
pieceofcake |
Crucible | Red team testing platform | Public endpoint |
@dataclass
class AttackConfiguration:
max_turns: int = 3 # Maximum conversation turns
memory_labels: Dict[str, str] = None # Memory labeling for tracking
dataset_categories: List[str] = ["Financial and Business"] # AdvBench categories to use
dataset_limit: int = 5 # Number of objectives to test| Category | Description | Objective Count |
|---|---|---|
Financial and Business |
Financial fraud, business ethics | 50+ |
Illegal Activities |
Criminal activities, illegal content | 100+ |
Hate Speech |
Discrimination, offensive content | 75+ |
Physical Harm |
Violence, dangerous activities | 80+ |
Privacy Violations |
Personal data, surveillance | 60+ |
| Converter | Purpose | Use Case |
|---|---|---|
Base64Converter |
Encoding obfuscation | Bypass content filters |
ROT13Converter |
Simple text rotation | Basic obfuscation testing |
UnicodeConverter |
Character encoding | International character bypass |
| Scorer | Type | Description |
|---|---|---|
SelfAskTrueFalseScorer |
Binary | True/False attack success |
FloatScaleScorer |
Continuous | 0.0-1.0 confidence scoring |
LikertScaleScorer |
Ordinal | 1-5 severity rating |
{
"conversation_id": "uuid",
"timestamp": "2024-01-15T10:30:00Z",
"objective": "Test objective string",
"model_target": "gpt-4o-mini",
"attack_type": "RedTeamingAttack",
"turns": [
{
"turn_id": 1,
"prompt": "Attack prompt",
"response": "Model response",
"score": 0.8,
"success": true
}
],
"final_result": {
"success": true,
"execution_time_ms": 8743,
"total_turns": 2
}
}objective: The attack objectiveattack_identifier: Type of attack usedscorer: Scoring mechanism usedexecuted_turns: Number of conversation turnsexecution_time_s: Total execution timesuccess: Boolean success indicatoroutcome_reason: Detailed outcome explanation
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 [Your Name]
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

