Optimize partitioning with AVX-512 VBMI2 instructions #4625

mulugetam · 2025-10-20T17:15:09Z

This PR improves the performance of partitioning by leveraging AVX-512 VBMI2 instructions. The optimization requires building FAISS with -DFAISS_OPT_LEVEL=avx512_spr. Benchmarks (e.g., benchs/bench_partition.py) show significant speedups compared to avx2 and avx512 opt levels, both of which use AVX2 in this case.

alexanderguzhva · 2025-10-20T22:08:36Z

@mulugetam any benchmark numbers by chance?

alibeklfc · 2025-10-21T02:46:22Z

Thank you for your contribution. We are currently working on a major update to SIMD support in Faiss, so we may prefer to postpone integrating this PR for now

mulugetam · 2025-10-22T02:00:47Z

@alexanderguzhva Below is a result I got from running benchs/bench_partition.py on AWS c7i.4xlarge instance.

-DFAISS_OPT_LEVEL=avx512
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.602 µs (± 1.5110 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.971 µs (± 0.6289 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 5.658 µs (± 0.7498 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 4.878 µs (± 0.7968 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 38.313 µs (± 2.5256 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 38.671 µs (± 3.9962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 3.112 µs (± 0.5962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 3.004 µs (± 0.5767 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 5.701 µs (± 0.7734 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 4.783 µs (± 0.7614 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 39.210 µs (± 2.9367 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 42.442 µs (± 2.8150 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

-DFAISS_OPT_LEVEL=avx512_spr
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.041 µs (± 1.5132 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.812 µs (± 0.5979 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 3.640 µs (± 0.6283 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 3.242 µs (± 0.6140 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 13.298 µs (± 1.3659 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 9.290 µs (± 1.1589 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 2.923 µs (± 0.6588 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 2.807 µs (± 0.6188 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 3.500 µs (± 0.5938 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 3.092 µs (± 0.5829 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 12.013 µs (± 1.1529 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 7.122 µs (± 0.9574 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

mulugetam · 2025-10-27T23:56:02Z

Closed in favor of #4637, which has a cleaner commit history.

meta-cla bot added the CLA Signed label Oct 20, 2025

alibeklfc assigned subhadeepkaran Oct 20, 2025

mulugetam force-pushed the partitioning_avx512 branch from 56d99eb to 4439b7f Compare October 22, 2025 01:51

mulugetam closed this Oct 27, 2025

mulugetam force-pushed the partitioning_avx512 branch from 6e07f72 to 2cf82ca Compare October 27, 2025 23:29

mulugetam mentioned this pull request Oct 27, 2025

Optimize partitioning with AVX-512 VBMI2 instructions (supersedes #4625) #4637

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize partitioning with AVX-512 VBMI2 instructions #4625

Optimize partitioning with AVX-512 VBMI2 instructions #4625

Uh oh!

mulugetam commented Oct 20, 2025 •

edited

Loading

Uh oh!

alexanderguzhva commented Oct 20, 2025

Uh oh!

alibeklfc commented Oct 21, 2025

Uh oh!

mulugetam commented Oct 22, 2025

Uh oh!

mulugetam commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Optimize partitioning with AVX-512 VBMI2 instructions #4625

Optimize partitioning with AVX-512 VBMI2 instructions #4625

Uh oh!

Conversation

mulugetam commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexanderguzhva commented Oct 20, 2025

Uh oh!

alibeklfc commented Oct 21, 2025

Uh oh!

mulugetam commented Oct 22, 2025

Uh oh!

mulugetam commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mulugetam commented Oct 20, 2025 •

edited

Loading