ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS
This repository includes the codes for the SIGMOD's Workshop DaMoN 2025 paper ParaGraph.
The master branch contains the codebase for the ParaGraph paper.
This section guides you through setting up the project and reproducing the experiments presented in our paper.
All ~bin data files adhere to the following structure (consistent with the Big-ANN competition format):
- Number of vectors:
uint32(4 bytes) - Dimension of vectors:
uint32(4 bytes) - Vector data: Sequentially listed vector components.
- You can obtain the required datasets from the RoarGraph repository.
- We utilize Python scripts for the necessary data transformations.
- For the index construction process, we exclusively use the base vector set (
base) and the corresponding ground truth data (gt).
The base vector data (base_data) is structured as an num x dim matrix, where:
num: Signifies the total number of vectors.dim: Denotes the dimensionality of each vector.
Note: These two parameters,
numanddim, must be pre-defined within the source code.
To ensure efficient GPU memory management, the handling of the ground truth (gt) data is modified. Specifically, the number of ground truth neighbors recorded for each query vector (often denoted as gt_num or top_k) is limited or adjusted to 128.
- CMake
v3.24or newer - g++
v9.4or newer - CPU with AVX-512 support
- Python
v3.8or newer - Required Python packages:
numpy
- NVIDIA GPU
- CUDA Toolkit
- cuDNN
Install the following system libraries:
sudo apt update
sudo apt install -y libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev libmkl-full-devYou can refer to additional resources to configure the GPU environment.
git clone https://github.com/9p6p/ParaGraph.git
cd ParaGraphFollow these steps to compile the project:
mkdir -p build
cd build
cmake .. && make -jTo build the ParaGraph index, run the provided script:
bash run_paragraph.shThis project is licensed under the MIT License.
For questions or inquiries, feel free to reach out to me at dev@alayadb.ai
If you use ParaGraph for your research, please cite us:
@inproceedings{yang2025paragraph,
author = {Yang, Yuxiang and Chen, Shiwen and Deng, Yangshen and Tang, Bo},
title = {ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS},
year = {2025},
isbn = {9798400719400},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3736227.3736237},
doi = {10.1145/3736227.3736237},
booktitle = {Proceedings of the 21st International Workshop on Data Management on New Hardware},
articleno = {7},
numpages = {10},
keywords = {Vector Database, GPU Acceleration, Index Construction},
location = {},
series = {DaMoN '25}
}