Automated MITRE ATT&CK Command Generation Pipeline

Overview

This project implements an automated, multi-stage pipeline for generating offensive security commands aligned with the MITRE ATT&CK Framework. The system uses large language models (LLMs), retrieval-augmented generation (RAG), and deterministic validation to produce diverse, technique-specific command-line examples suitable for red-team simulation, security research, and dataset creation.

Rather than relying on a single prompt to generate commands, the pipeline decomposes the task into multiple scoped stages, each responsible for a single concern:

Scenario generation grounded in MITRE ATT&CK technique context
Command generation constrained by OS and technique requirements
Deterministic validation to enforce correctness, alignment, and uniqueness
Dataset construction and export in Parquet format

This design significantly reduces repetition, technique confusion, and generic command output when operating at scale.

Hugging Face Dataset

The final output of this pipeline is a curated dataset of validated, technique-aligned commands published on Hugging Face.

Dataset:
https://huggingface.co/datasets/nsgood/mitre-attack-command-generation

Each record in the dataset contains the following fields:

Command – The generated command line
TechniqueID – MITRE ATT&CK technique or sub-technique ID
TechniqueName – Human-readable name of the technique/sub-technique
Tactic – The associated MITRE ATT&CK tactic category
OS – Target operating system (windows, linux, macos)
Scenario – The attacker scenario description used to drive generation
Notes – Brief explanation of how the command implements the technique

The dataset is distributed in Parquet format for efficient analysis and downstream machine learning workflows.

Pipeline Architecture

The pipeline is intentionally modular. Each stage operates independently and communicates via well-defined data structures, allowing targeted improvements without destabilizing the rest of the system.

1. MITRE Context Retrieval (RAG)

Relevant MITRE ATT&CK context is retrieved per technique and injected into prompts to anchor generation to the correct adversarial behavior.

Key component:

MITRE ATT&CK retrieval-augmented context injection

This approach reduces semantic drift and prevents the generation of unrelated or overly generic commands.

2. Scenario Generation

For each technique, the system generates multiple attacker scenarios that describe:

The attacker’s objective
The execution method
The operating system context

Scenarios are structured rather than free-form prose, ensuring they can be reliably translated into executable commands.

3. Command Generation

Each scenario is converted into a single, concrete command that:

Implements the specified MITRE ATT&CK technique
Runs on the declared operating system
Uses appropriate native tooling
Avoids irrelevant utilities or external red-team frameworks

4. Validation and Normalization

Generated commands are passed through a deterministic validation stage that enforces:

Valid JSON structure
OS compatibility
Technique-specific behavior (not generic discovery or administrative commands)
De-duplication across batches
Minor syntax normalization where safe

Invalid or redundant outputs are discarded before dataset inclusion.

5. Dataset Assembly

Validated commands are incrementally written to disk and optionally pushed to Hugging Face as a versioned dataset.

Execution Environment

This pipeline is designed to run in a GPU-enabled environment and has been tested using Modal for scalable and batched execution.

Modal Notebook:

https://modal.com/notebooks/nsgood/main/nb-x3yPFldVLQVjX6jue3SH7X

The notebook demonstrates:

Cloning and running the pipeline in a GPU-enabled Modal environment
End-to-end pipeline execution with batched generation
Parquet dataset export
Pushing the exported dataset to Hugging Face

Intended Use

This project is intended to support practical security research and experimentation, including:

Exploring how specific MITRE ATT&CK techniques translate into concrete command-line behavior
Generating technique-aligned command examples that can be used in red-team and defensive research workflows
Producing structured datasets for security-focused machine learning and analysis

The pipeline is designed for defensive research and analysis and is not intended for unauthorized or malicious use.

Acknowledgments

MITRE ATT&CK® Framework
Hugging Face Datasets
Modal

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
report		report
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated MITRE ATT&CK Command Generation Pipeline

Overview

Hugging Face Dataset

Pipeline Architecture

1. MITRE Context Retrieval (RAG)

2. Scenario Generation

3. Command Generation

4. Validation and Normalization

5. Dataset Assembly

Execution Environment

Intended Use

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated MITRE ATT&CK Command Generation Pipeline

Overview

Hugging Face Dataset

Pipeline Architecture

1. MITRE Context Retrieval (RAG)

2. Scenario Generation

3. Command Generation

4. Validation and Normalization

5. Dataset Assembly

Execution Environment

Intended Use

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages