Skip to content

Conversation

@mulugetam
Copy link
Contributor

@mulugetam mulugetam commented Oct 20, 2025

This PR improves the performance of partitioning by leveraging AVX-512 VBMI2 instructions. The optimization requires building FAISS with -DFAISS_OPT_LEVEL=avx512_spr. Benchmarks (e.g., benchs/bench_partition.py) show significant speedups compared to avx2 and avx512 opt levels, both of which use AVX2 in this case.

@alexanderguzhva
Copy link
Contributor

@mulugetam any benchmark numbers by chance?

@alibeklfc
Copy link
Contributor

Thank you for your contribution. We are currently working on a major update to SIMD support in Faiss, so we may prefer to postpone integrating this PR for now

@mulugetam mulugetam force-pushed the partitioning_avx512 branch from 56d99eb to 4439b7f Compare October 22, 2025 01:51
@mulugetam
Copy link
Contributor Author

@alexanderguzhva Below is a result I got from running benchs/bench_partition.py on AWS c7i.4xlarge instance.

-DFAISS_OPT_LEVEL=avx512
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.602 µs (± 1.5110 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.971 µs (± 0.6289 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 5.658 µs (± 0.7498 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 4.878 µs (± 0.7968 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 38.313 µs (± 2.5256 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 38.671 µs (± 3.9962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 3.112 µs (± 0.5962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 3.004 µs (± 0.5767 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 5.701 µs (± 0.7734 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 4.783 µs (± 0.7614 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 39.210 µs (± 2.9367 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 42.442 µs (± 2.8150 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

-DFAISS_OPT_LEVEL=avx512_spr
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.041 µs (± 1.5132 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.812 µs (± 0.5979 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 3.640 µs (± 0.6283 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 3.242 µs (± 0.6140 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 13.298 µs (± 1.3659 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 9.290 µs (± 1.1589 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 2.923 µs (± 0.6588 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 2.807 µs (± 0.6188 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 3.500 µs (± 0.5938 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 3.092 µs (± 0.5829 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 12.013 µs (± 1.1529 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 7.122 µs (± 0.9574 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

@mulugetam
Copy link
Contributor Author

Closed in favor of #4637, which has a cleaner commit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants