Self-Supervised Learning of Graph Representations for Network Intrusion Detection

This repository provides the official code for our paper, accepted at NeurIPS 2025.

Overview

GraphIDS is a self-supervised intrusion detection system that learns graph representations of normal network traffic patterns. The model combines:

E-GraphSAGE: An inductive GNN that embeds each flow with its local topological context
Transformer Autoencoder with Attention Masking: Reconstructs flow embeddings while learning global co-occurrence patterns

Flows with high reconstruction errors are flagged as potential intrusions. By jointly training both components end-to-end, the model achieves state-of-the-art performance on NetFlow benchmarks (up to 99.98% PR-AUC and 99.61% macro F1-score).

Note: This implementation uses PyTorch Geometric (PyG) for improved maintainability and broader compatibility. For exact reproduction of the paper results and pretrained models, see the DGL branch.

Requirements

Installation

This project uses uv for project management.

Sync the project dependencies:

uv sync

This will create a virtual environment and install all dependencies specified in pyproject.toml.

Note: The project is configured for CUDA 12.8. If you need a different CUDA version, modify the [tool.uv.index] URL in pyproject.toml.

Datasets

The datasets can be downloaded from this website: https://staff.itee.uq.edu.au/marius/NIDS_datasets/

After downloading each dataset zip file, unzip it with the following command:

unzip -d <dataset_name> -j <filename>.zip

For example, for the NF-UNSW-NB15-v3 dataset:

unzip -d NF-UNSW-NB15-v3 -j f7546561558c07c5_NFV3DATA-A11964_A11964.zip

NOTE: The authors recently renamed the file for the NF-CSE-CIC-IDS2018-v2 and NF-CSE-CIC-IDS2018-v3 datasets as NF-CICIDS2018-v2 and NF-CICIDS2018-v3.
To keep a consistent naming convention with the literature, the code expects the dataset directory and the dataset CSV file to be named as one of the 4 considered datasets: NF-UNSW-NB15-v2, NF-UNSW-NB15-v3, NF-CSE-CIC-IDS2018-v2, NF-CSE-CIC-IDS2018-v3.

Quick Start

Once you have installed the dependencies and downloaded a dataset, you can train GraphIDS with:

uv run main.py --data_dir data/ --config configs/NF-UNSW-NB15-v3.yaml

This will train the model on the NF-UNSW-NB15-v3 dataset and automatically evaluate it after training.

Training

Experiment Tracking

We use Weights & Biases for experiment tracking. W&B is set to offline mode by default—no login is required, and all logs are stored locally. To enable online mode, pass the --wandb flag.

Running Training

To train GraphIDS, run this command:

uv run main.py --data_dir <data_dir> --config configs/<dataset_name>.yaml

<data_dir> should point to the directory containing all the datasets. The code expects the directory structure found in the zip files (i.e., each CSV file should be located at <data_dir>/<dataset_name>/<dataset_name>.csv). For example, for the following directory structure:

data/
└── NF-UNSW-NB15-v3
    ├── FurtherInformation.txt
    ├── NF-UNSW-NB15-v3.csv
    ├── NetFlow_v3_Features.csv
    ├── bag-info.txt
    ├── bagit.txt
    ├── manifest-sha1.txt
    └── tagmanifest-sha1.txt
configs/
└── NF-UNSW-NB15-v3.yaml

You should run:

uv run main.py --data_dir data/ --config configs/NF-UNSW-NB15-v3.yaml

To specify different training parameters, you can either modify the configuration file in the configs/ directory, or provide all parameters using command-line arguments. The full list of possible arguments can be accessed by running the command:

uv run main.py --help

Evaluation

By running the command above, the model would also be evaluated after training. However, to only evaluate the model from a saved checkpoint, run the following command:

uv run main.py --data_dir <data_dir> --config configs/<dataset_name>.yaml --checkpoint checkpoints/GraphIDS_<dataset_name>_<seed>.ckpt --test

Development

If you plan to modify the code, we recommend installing the development dependencies:

uv sync --extra dev

This installs ruff for linting and formatting, and pre-commit hooks.

To set up the git hooks:

uv run pre-commit install

To run the linter and formatter manually:

uv run ruff check .
uv run ruff format .

Results

Our model achieves the following performance on the following datasets:

NF-UNSW-NB15-v3

Model name	Macro F1-score	Macro PR-AUC
GraphIDS	99.61%	99.98%

NF-CSE-CIC-IDS2018-v3

Model name	Macro F1-score	Macro PR-AUC
GraphIDS	94.47%	88.19%

NF-UNSW-NB15-v2

Model name	Macro F1-score	Macro PR-AUC
GraphIDS	92.64%	81.16%

NF-CSE-CIC-IDS2018-v2

Model name	Macro F1-score	Macro PR-AUC
GraphIDS	94.31%	92.01%

The results are averaged over multiple seeds.

Citation

If you find this work useful in your research, please consider citing our paper:

@misc{guerra2025graphrepresentations,
      title={Self-Supervised Learning of Graph Representations for Network Intrusion Detection},
      author={Lorenzo Guerra and Thomas Chapuis and Guillaume Duc and Pavlo Mozharovskyi and Van-Tam Nguyen},
      year={2025},
      eprint={2509.16625},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.16625},
}

License

All original components of this repository are licensed under the Apache License 2.0. Third-party components are used in compliance with their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
figures		figures
models		models
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-Supervised Learning of Graph Representations for Network Intrusion Detection

Overview

Requirements

Installation

Datasets

Quick Start

Training

Experiment Tracking

Running Training

Evaluation

Development

Results

NF-UNSW-NB15-v3

NF-CSE-CIC-IDS2018-v3

NF-UNSW-NB15-v2

NF-CSE-CIC-IDS2018-v2

Citation

License

About

Uh oh!

Languages

License

lorenzo9uerra/GraphIDS

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Learning of Graph Representations for Network Intrusion Detection

Overview

Requirements

Installation

Datasets

Quick Start

Training

Experiment Tracking

Running Training

Evaluation

Development

Results

NF-UNSW-NB15-v3

NF-CSE-CIC-IDS2018-v3

NF-UNSW-NB15-v2

NF-CSE-CIC-IDS2018-v2

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages