Skip to content

Conversation

@michaelbautin
Copy link
Collaborator

@michaelbautin michaelbautin commented Sep 26, 2025

Original patch was created by @PingXie (reviewed by @yairgott). These changes originated in the valkey-search project.

Original commit message below:

This change improves the memory allocation strategy of the HNSW index data structures by adopting chunked arrays. Previously, resizing the HNSW index involved expensive realloc calls, which were particularly costly in two ways:

Memory Consumption: During resizing, realloc temporarily requires double the memory to accommodate the old data block until the new block is ready and the old block is freed. This posed a substantial issue when dealing with gigabytes of data, leading to excessive memory usage.
Performance Overhead: The realloc operation entails copying data from the old block to the new block, invoking an expensive memcpy operation. This process becomes increasingly burdensome as the data size grows, resulting in significant performance degradation for large-scale indices.
By transitioning to chunked arrays for memory allocation, we circumvent the need for realloc calls during resizing. This approach not only reduces the memory footprint by eliminating the temporary doubling of memory but also enhances overall performance by avoiding the costly memcpy operations. The chunked array strategy allocates new memory in smaller, manageable chunks as the data grows, ensuring more efficient use of memory and improving the scalability of HNSW index operations, especially critical for indices spanning multiple gigabytes.

@michaelbautin michaelbautin changed the title Chunked memory allocation Optimize HNSW Memory Allocation with Chunked Arrays Sep 29, 2025
Copy link

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Thanks for upstreaming this change, Michael!

@yairgott
Copy link

yairgott commented Oct 2, 2025

LGTM!

In case not already done, it would be useful to run a sanity test with an ASAN build.

@michaelbautin
Copy link
Collaborator Author

@yairgott : ASAN tests are already being run in GitHub Actions.

@michaelbautin
Copy link
Collaborator Author

michaelbautin commented Oct 29, 2025

Here are some performance test results.

Used ann-benchmarks at commit f402b2cc17b980d7cd45241ab5a7a4cc0f965e55 with the following patch applied: https://gist.githubusercontent.com/michaelbautin/fe38704f9678532ee42d0788b0fc9ccb/raw

Used the following script to alternate between unmodified and modified versions of hnswlib: https://gist.githubusercontent.com/michaelbautin/b7b6264f899e835d2da06d641c1e05f8/raw

The modified version of hnswlib (~/code/hnswlib) was at commit https://github.com/michaelbautin/hnswlib/commits/c32cc21ee5ef0797c9888bcc6519eba6304bcb4f and the unmodified/baseline version (~/code/hnswlib-baseline) was at commit https://github.com/nmslib/hnswlib/commits/2fba7fba961fffcbfb6578d3d2979419edf1e0e7.

Tested on a 96-vCPU GCE VM with an INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz CPU.

The index build times did not change significantly and were in the range of 421-424 seconds for both versions of hnswlib on the deep-image-96-angular ann-benchmarks dataset.

According to a two-sided t-test computed using the T.TEST(<range1>,<range2>,2,3) function in Google Spreadsheets, there is no statistically significant difference between index build times (20 runs, with/without changes).

image

Tests using speedtest.py as requested by @yurymalkov :

Baseline (2fba7fb):

$ python3 tests/python/speedtest.py -d 4 -t 1
0.025458335876464844 seconds, recall= 1.0
0.024799585342407227 seconds, recall= 1.0
0.0272064208984375 seconds, recall= 1.0
0.025821447372436523, 0.025458335876464844, 0.0010155792492704126, 0.6601953506469727, 1.0, None
$ python3 tests/python/speedtest.py -d 4 -t 64
0.3272378444671631 seconds, recall= 1.0
0.3278830051422119 seconds, recall= 1.0
0.3281400203704834 seconds, recall= 1.0
0.32775362332661945, 0.3278830051422119, 0.0003795041332998049, 0.6465816497802734, 1.0, None

With the changes:

$ python3  tests/python/speedtest.py -d 4 -t 1
0.03078746795654297 seconds, recall= 1.0
0.029574155807495117 seconds, recall= 1.0
0.02940821647644043 seconds, recall= 1.0
0.029923280080159504, 0.029574155807495117, 0.0006148167638485685, 0.6504831314086914, 1.0, None
$ python3 tests/python/speedtest.py -d 4 -t 64
0.3246622085571289 seconds, recall= 1.0
0.32256174087524414 seconds, recall= 1.0
0.32404065132141113 seconds, recall= 1.0
0.32375486691792804, 0.32404065132141113, 0.0008810015757141497, 0.6430518627166748, 1.0, None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants