This repository provides the official code for our paper, accepted at NeurIPS 2025.
GraphIDS is a self-supervised intrusion detection system that learns graph representations of normal network traffic patterns. The model combines:
- E-GraphSAGE: An inductive GNN that embeds each flow with its local topological context
- Transformer Autoencoder with Attention Masking: Reconstructs flow embeddings while learning global co-occurrence patterns
Flows with high reconstruction errors are flagged as potential intrusions. By jointly training both components end-to-end, the model achieves state-of-the-art performance on NetFlow benchmarks (up to 99.98% PR-AUC and 99.61% macro F1-score).
Note: This implementation uses PyTorch Geometric (PyG) for improved maintainability and broader compatibility. For exact reproduction of the paper results and pretrained models, see the DGL branch.
This project uses uv for project management.
Sync the project dependencies:
uv syncThis will create a virtual environment and install all dependencies specified in pyproject.toml.
Note: The project is configured for CUDA 12.8. If you need a different CUDA version, modify the [tool.uv.index] URL in pyproject.toml.
The datasets can be downloaded from this website: https://staff.itee.uq.edu.au/marius/NIDS_datasets/
After downloading each dataset zip file, unzip it with the following command:
unzip -d <dataset_name> -j <filename>.zipFor example, for the NF-UNSW-NB15-v3 dataset:
unzip -d NF-UNSW-NB15-v3 -j f7546561558c07c5_NFV3DATA-A11964_A11964.zipNOTE: The authors recently renamed the file for the NF-CSE-CIC-IDS2018-v2 and NF-CSE-CIC-IDS2018-v3 datasets as NF-CICIDS2018-v2 and NF-CICIDS2018-v3.
To keep a consistent naming convention with the literature, the code expects the dataset directory and the dataset CSV file to be named as one of the 4 considered datasets: NF-UNSW-NB15-v2, NF-UNSW-NB15-v3, NF-CSE-CIC-IDS2018-v2, NF-CSE-CIC-IDS2018-v3.
Once you have installed the dependencies and downloaded a dataset, you can train GraphIDS with:
uv run main.py --data_dir data/ --config configs/NF-UNSW-NB15-v3.yamlThis will train the model on the NF-UNSW-NB15-v3 dataset and automatically evaluate it after training.
We use Weights & Biases for experiment tracking. W&B is set to offline mode by default—no login is required, and all logs are stored locally. To enable online mode, pass the --wandb flag.
To train GraphIDS, run this command:
uv run main.py --data_dir <data_dir> --config configs/<dataset_name>.yaml<data_dir> should point to the directory containing all the datasets. The code expects the directory structure found in the zip files (i.e., each CSV file should be located at <data_dir>/<dataset_name>/<dataset_name>.csv). For example, for the following directory structure:
data/
└── NF-UNSW-NB15-v3
├── FurtherInformation.txt
├── NF-UNSW-NB15-v3.csv
├── NetFlow_v3_Features.csv
├── bag-info.txt
├── bagit.txt
├── manifest-sha1.txt
└── tagmanifest-sha1.txt
configs/
└── NF-UNSW-NB15-v3.yaml
You should run:
uv run main.py --data_dir data/ --config configs/NF-UNSW-NB15-v3.yamlTo specify different training parameters, you can either modify the configuration file in the configs/ directory, or provide all parameters using command-line arguments. The full list of possible arguments can be accessed by running the command:
uv run main.py --helpBy running the command above, the model would also be evaluated after training. However, to only evaluate the model from a saved checkpoint, run the following command:
uv run main.py --data_dir <data_dir> --config configs/<dataset_name>.yaml --checkpoint checkpoints/GraphIDS_<dataset_name>_<seed>.ckpt --testIf you plan to modify the code, we recommend installing the development dependencies:
uv sync --extra devThis installs ruff for linting and formatting, and pre-commit hooks.
To set up the git hooks:
uv run pre-commit installTo run the linter and formatter manually:
uv run ruff check .
uv run ruff format .Our model achieves the following performance on the following datasets:
| Model name | Macro F1-score | Macro PR-AUC |
|---|---|---|
| GraphIDS | 99.61% | 99.98% |
| Model name | Macro F1-score | Macro PR-AUC |
|---|---|---|
| GraphIDS | 94.47% | 88.19% |
| Model name | Macro F1-score | Macro PR-AUC |
|---|---|---|
| GraphIDS | 92.64% | 81.16% |
| Model name | Macro F1-score | Macro PR-AUC |
|---|---|---|
| GraphIDS | 94.31% | 92.01% |
The results are averaged over multiple seeds.
If you find this work useful in your research, please consider citing our paper:
@misc{guerra2025graphrepresentations,
title={Self-Supervised Learning of Graph Representations for Network Intrusion Detection},
author={Lorenzo Guerra and Thomas Chapuis and Guillaume Duc and Pavlo Mozharovskyi and Van-Tam Nguyen},
year={2025},
eprint={2509.16625},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.16625},
}All original components of this repository are licensed under the Apache License 2.0. Third-party components are used in compliance with their respective licenses.
