Skip to content

stanleygvi/MITRE_CMD_GEN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

167 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Automated MITRE ATT&CK Command Generation Pipeline

Overview

This project implements an automated, multi-stage pipeline for generating offensive security commands aligned with the MITRE ATT&CK Framework. The system uses large language models (LLMs), retrieval-augmented generation (RAG), and deterministic validation to produce diverse, technique-specific command-line examples suitable for red-team simulation, security research, and dataset creation.

Rather than relying on a single prompt to generate commands, the pipeline decomposes the task into multiple scoped stages, each responsible for a single concern:

  1. Scenario generation grounded in MITRE ATT&CK technique context
  2. Command generation constrained by OS and technique requirements
  3. Deterministic validation to enforce correctness, alignment, and uniqueness
  4. Dataset construction and export in Parquet format

This design significantly reduces repetition, technique confusion, and generic command output when operating at scale.


Hugging Face Dataset

The final output of this pipeline is a curated dataset of validated, technique-aligned commands published on Hugging Face.

Dataset:
https://huggingface.co/datasets/nsgood/mitre-attack-command-generation

Each record in the dataset contains the following fields:

  • Command – The generated command line
  • TechniqueID – MITRE ATT&CK technique or sub-technique ID
  • TechniqueName – Human-readable name of the technique/sub-technique
  • Tactic – The associated MITRE ATT&CK tactic category
  • OS – Target operating system (windows, linux, macos)
  • Scenario – The attacker scenario description used to drive generation
  • Notes – Brief explanation of how the command implements the technique

The dataset is distributed in Parquet format for efficient analysis and downstream machine learning workflows.


Pipeline Architecture

The pipeline is intentionally modular. Each stage operates independently and communicates via well-defined data structures, allowing targeted improvements without destabilizing the rest of the system.

1. MITRE Context Retrieval (RAG)

Relevant MITRE ATT&CK context is retrieved per technique and injected into prompts to anchor generation to the correct adversarial behavior.

Key component:

  • MITRE ATT&CK retrieval-augmented context injection

This approach reduces semantic drift and prevents the generation of unrelated or overly generic commands.

2. Scenario Generation

For each technique, the system generates multiple attacker scenarios that describe:

  • The attacker’s objective
  • The execution method
  • The operating system context

Scenarios are structured rather than free-form prose, ensuring they can be reliably translated into executable commands.

3. Command Generation

Each scenario is converted into a single, concrete command that:

  • Implements the specified MITRE ATT&CK technique
  • Runs on the declared operating system
  • Uses appropriate native tooling
  • Avoids irrelevant utilities or external red-team frameworks

4. Validation and Normalization

Generated commands are passed through a deterministic validation stage that enforces:

  • Valid JSON structure
  • OS compatibility
  • Technique-specific behavior (not generic discovery or administrative commands)
  • De-duplication across batches
  • Minor syntax normalization where safe

Invalid or redundant outputs are discarded before dataset inclusion.

5. Dataset Assembly

Validated commands are incrementally written to disk and optionally pushed to Hugging Face as a versioned dataset.


Execution Environment

This pipeline is designed to run in a GPU-enabled environment and has been tested using Modal for scalable and batched execution.

Modal Notebook:

https://modal.com/notebooks/nsgood/main/nb-x3yPFldVLQVjX6jue3SH7X

The notebook demonstrates:

  • Cloning and running the pipeline in a GPU-enabled Modal environment
  • End-to-end pipeline execution with batched generation
  • Parquet dataset export
  • Pushing the exported dataset to Hugging Face

Intended Use

This project is intended to support practical security research and experimentation, including:

  • Exploring how specific MITRE ATT&CK techniques translate into concrete command-line behavior
  • Generating technique-aligned command examples that can be used in red-team and defensive research workflows
  • Producing structured datasets for security-focused machine learning and analysis

The pipeline is designed for defensive research and analysis and is not intended for unauthorized or malicious use.


Acknowledgments

  • MITRE ATT&CK® Framework
  • Hugging Face Datasets
  • Modal

About

Automated MITRE ATT&CK Command Generation Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages