ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS

This repository includes the codes for the SIGMOD's Workshop DaMoN 2025 paper ParaGraph.

The master branch contains the codebase for the ParaGraph paper.

Getting Started & Reproducing Experiments

This section guides you through setting up the project and reproducing the experiments presented in our paper.

File Format

All ~bin data files adhere to the following structure (consistent with the Big-ANN competition format):

Number of vectors: uint32 (4 bytes)
Dimension of vectors: uint32 (4 bytes)
Vector data: Sequentially listed vector components.

Data Acquisition and Preparation

You can obtain the required datasets from the RoarGraph repository.
We utilize Python scripts for the necessary data transformations.
For the index construction process, we exclusively use the base vector set (base) and the corresponding ground truth data (gt).

The base vector data (base_data) is structured as an num x dim matrix, where:

num: Signifies the total number of vectors.
dim: Denotes the dimensionality of each vector.

Note: These two parameters, num and dim, must be pre-defined within the source code.

To ensure efficient GPU memory management, the handling of the ground truth (gt) data is modified. Specifically, the number of ground truth neighbors recorded for each query vector (often denoted as gt_num or top_k) is limited or adjusted to 128.

0. Prerequisites

Software & System Requirements:

CMake v3.24 or newer
g++ v9.4 or newer
CPU with AVX-512 support

Python Environment:

Python v3.8 or newer
Required Python packages:
- numpy

GPU Requirements:

NVIDIA GPU
CUDA Toolkit
cuDNN

System Dependencies:

Install the following system libraries:

sudo apt update
sudo apt install -y libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev libmkl-full-dev

You can refer to additional resources to configure the GPU environment.

Clone the Repository:

git clone https://github.com/9p6p/ParaGraph.git
cd ParaGraph

1. Compile and build

Follow these steps to compile the project:

mkdir -p build
cd build
cmake .. && make -j

2. Bulild Index

To build the ParaGraph index, run the provided script:

bash run_paragraph.sh

License

This project is licensed under the MIT License.

Contact

For questions or inquiries, feel free to reach out to me at dev@alayadb.ai

Citation

If you use ParaGraph for your research, please cite us:

@inproceedings{yang2025paragraph,
   author = {Yang, Yuxiang and Chen, Shiwen and Deng, Yangshen and Tang, Bo},
   title = {ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS},
   year = {2025},
   isbn = {9798400719400},
   publisher = {Association for Computing Machinery},
   address = {New York, NY, USA},
   url = {https://doi.org/10.1145/3736227.3736237},
   doi = {10.1145/3736227.3736237},
   booktitle = {Proceedings of the 21st International Workshop on Data Management on New Hardware},
   articleno = {7},
   numpages = {10},
   keywords = {Vector Database, GPU Acceleration, Index Construction},
   location = {},
   series = {DaMoN '25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.vscode		.vscode
build		build
data		data
include		include
indices		indices
logs		logs
src		src
tests		tests
.clang-format		.clang-format
.clangd		.clangd
.gitignore		.gitignore
CLONE.md		CLONE.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
run_paragraph.sh		run_paragraph.sh
transform_data_to_gpu.ipynb		transform_data_to_gpu.ipynb
transform_gt_to_128.ipynb		transform_gt_to_128.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS

Getting Started & Reproducing Experiments

File Format

Data Acquisition and Preparation

0. Prerequisites

Software & System Requirements:

Python Environment:

GPU Requirements:

System Dependencies:

Clone the Repository:

1. Compile and build

2. Bulild Index

License

Contact

Citation

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

AlayaDB-AI/ParaGraph

Folders and files

Latest commit

History

Repository files navigation

ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS

Getting Started & Reproducing Experiments

File Format

Data Acquisition and Preparation

0. Prerequisites

Software & System Requirements:

Python Environment:

GPU Requirements:

System Dependencies:

Clone the Repository:

1. Compile and build

2. Bulild Index

License

Contact

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages