Add half-float (FP16) storage support for vectors #15549

Pulkitg64 · 2026-01-05T18:40:28Z

Description

This draft PR explores storing float vectors using 2 bytes (half-float/FP16) instead of 4 bytes (FP32), reducing vector disk usage by approximately 50%. The approach involves storing vectors on disk in half-float format while converting them back to full-float precision for dot-product computations during search and index merge operations. However, this conversion step introduces additional overhead during vector reads, resulting in slower indexing and search performance.

This is an early draft to gather community feedback on the viability and direction of this implementation..

TODO : Support for MemorySegmentVectorScorer with half-float vectors is yet to be implemented.

Benchmark Results:

For no quantization, we are seeing around 100% increase in latency. For 8bit quantization, we are not seeing latency regression but for 4 bit we are seeing about 18% latency regression. We are seeing 20-25% drop in indexing rate across all quantization.

Encoding	recall	latency(ms)	quantized	index(s)	index_docs/s	index_size(MB)	vec_disk(MB)	vec_RAM(MB)
float16	0.991	11.392	no	34.8	2873.81	206.22	390.625	390.625
float16	0.981	4.337	8 bits	41.55	2406.97	305.4	294.495	99.182
float16	0.926	6.069	4 bits	42.07	2376.93	256.58	245.667	50.354
float32	0.991	4.942	no	28.93	3456.38	401.53	390.625	390.625
float32	0.981	4.367	8 bits	32.04	3121.49	500.71	489.807	99.182
float32	0.926	5.343	4 bits	32.12	3113.33	451.91	440.979	50.354

benwtrent · 2026-01-05T19:07:33Z

@Pulkitg64 the latency is the main concern IMO. We must copy the vectors onto heap (we know this is expensive), transform the bytes to float32 (which is an additional cost), then do the float32 panama vector actions (which are super fast). I would expect this to also impact quantization query time for anything that must rescore (though, likely less of an impact as that would be fewer vectors to decode).

I wonder if all the cost is spent just decoding the vector? What does a flame graph tell you?

Also, could you indicate your JVM, etc.?

See this interesting jep update on the ever incubating vector API:

https://openjdk.org/jeps/508

Addition, subtraction, division, multiplication, square root, and fused multiply/add operations on Float16 values are now auto-vectorized on supporting x64 CPUs.

benwtrent · 2026-01-05T19:09:59Z

@Pulkitg64 also, thank you for doing an initial pass and benchmarking, its important data :D.

I wonder if we want a true element type vs. a new format?

The element type has indeed expanded its various uses, but for many of them, Float16 isn't that much different than float (e.g. you still likely query & index with float[], still use FloatVectorValues, etc.). The only difference is the on disk representation (which...seems like a format thing).

This is just an idea. I am not 100% sold either way. Looking for discussion.

rmuir · 2026-01-05T19:21:11Z

You need https://bugs.openjdk.org/browse/JDK-8370691 for this one to be performant.

rmuir · 2026-01-05T19:34:26Z

Just look at numbers on the PR. they benchmark the cosine and the dot product. Maybe try it out with the branch from that openjdk PR.

Code in o.a.l.internal.vectorization will be needed that takes advantage of the new Float16Vector or whatever the name ends out being. I would try to keep it looking as close to the existing 32-bit float code as possible.

Pulkitg64 · 2026-01-06T14:55:18Z

Thanks @benwtrent, @rmuir for such quick responses.

Let me try to gather some more data to confirm if the conversion is driving the regression.

Just look at numbers on the PR. they benchmark the cosine and the dot product. Maybe try it out with the branch from that openjdk PR.

Code in o.a.l.internal.vectorization will be needed that takes advantage of the new Float16Vector or whatever the name ends out being. I would try to keep it looking as close to the existing 32-bit float code as possible.

Trying now

Pulkitg64 · 2026-01-07T17:14:09Z

Here is the output difference from profiler between float16 and float32 benchmark runs for no quantization. Based on below comparison, it can be clearly seen the additional latency in float16 benchmark run is coming while reading float16 vectors.

Also, could you indicate your JVM, etc.?

I am running these test on x86 machine with JDK25

java --version
openjdk 25.0.1 2025-10-21 LTS
OpenJDK Runtime Environment Corretto-25.0.1.9.1 (build 25.0.1+9-LTS)
OpenJDK 64-Bit Server VM Corretto-25.0.1.9.1 (build 25.0.1+9-LTS, mixed mode, sharing)

rmuir · 2026-01-07T18:48:48Z

stop converting. use the native fp16 type (and vector type), otherwise code will be slow

Pulkit Gupta added 9 commits December 22, 2025 14:07

Add float16 support

2854714

Do not add support for float16

f0f6224

Add support in test cases as well

09cb1ca

Add support in test cases

e6d7da0

Fixed byteSize calculation

237a506

tidy

232cd17

Fixed float16 vector generator

2145c90

Fixed test cases

d86281c

Tidy

076a51c

github-actions bot added module:core/store module:core/index module:core/search module:core/codecs module:test-framework module:sandbox module:core/hnsw labels Jan 5, 2026

Pulkitg64 marked this pull request as draft January 5, 2026 18:41

Pulkitg64 mentioned this pull request Jan 5, 2026

Should we add bfloat16 support for HNSW? #12403

Open

Add java docs

7e4689f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add half-float (FP16) storage support for vectors #15549

Add half-float (FP16) storage support for vectors #15549

Pulkitg64 commented Jan 5, 2026 •

edited

Loading

Uh oh!

benwtrent commented Jan 5, 2026

Uh oh!

benwtrent commented Jan 5, 2026

Uh oh!

rmuir commented Jan 5, 2026

Uh oh!

rmuir commented Jan 5, 2026

Uh oh!

Pulkitg64 commented Jan 6, 2026

Uh oh!

Pulkitg64 commented Jan 7, 2026 •

edited

Loading

Uh oh!

rmuir commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add half-float (FP16) storage support for vectors #15549

Are you sure you want to change the base?

Add half-float (FP16) storage support for vectors #15549

Conversation

Pulkitg64 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

benwtrent commented Jan 5, 2026

Uh oh!

benwtrent commented Jan 5, 2026

Uh oh!

rmuir commented Jan 5, 2026

Uh oh!

rmuir commented Jan 5, 2026

Uh oh!

Pulkitg64 commented Jan 6, 2026

Uh oh!

Pulkitg64 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmuir commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pulkitg64 commented Jan 5, 2026 •

edited

Loading

Pulkitg64 commented Jan 7, 2026 •

edited

Loading