Skip to content

Conversation

@jasperpotts
Copy link
Member

Added xxh3 64bit hash streaming implementation of WritableSequentialData. Added JMH benchmark for model object hashCode(). Rewritten code generation to generate hashCode64() methods and hashCode() using xxhash.

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
…ata. Added JMH benchmark for model object hashCode(). Rewritten code generation to generate hashCode64() methods and hashCode() using xxhash.

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
@jasperpotts jasperpotts requested review from a team as code owners August 14, 2025 01:33
@github-actions
Copy link

github-actions bot commented Aug 14, 2025

JUnit Test Report

   72 files  +1     72 suites  +1   2m 23s ⏱️ -46s
1 289 tests +3  1 285 ✅ +3   4 💤 ±0  0 ❌ ±0 
7 145 runs  +4  7 125 ✅ +4  20 💤 ±0  0 ❌ ±0 

Results for commit a5d1b7b. ± Comparison against base commit d5d325e.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Aug 14, 2025

Integration Test Report

    406 files  + 2      406 suites  +2   18m 7s ⏱️ +11s
114 885 tests +50  114 639 ✅  - 196  0 💤 ±0  246 ❌ +246 
115 126 runs  +50  114 880 ✅  - 196  0 💤 ±0  246 ❌ +246 

For more details on these failures, see this check.

Results for commit a5d1b7b. ± Comparison against base commit d5d325e.

♻️ This comment has been updated with latest results.

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
Comment on lines +38 to +48
@Override
public int hashCode() {
return(int)hashCode64();
}
/**
* Extended 64bit hashCode method for to make hashCode better distributed and follows protobuf rules
* for default values. This is important for backward compatibility. This also lazy computes and caches the
* hashCode for future calls. It is designed to be thread safe.
*/
public long hashCode64() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I like this approach.

Still unsure if (int)long is the best way to go for the legacy hash code, but overall, the approach in this PR looks a lot better than the one in the Bytes PR.

"""
if ($unknownFields != null) {
for (int i = 0; i < $unknownFields.size(); i++) {
hashingStream.writeInt($unknownFields.get(i).hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'm wondering if we could just throw the unknown field bytes into the hashingStream and let it compute a proper 64-bit hash for all the unknown fields data?

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
…sh quality testing

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
Comment on lines +76 to +78
@Override
public long hashCode64() {
return bytes.hashCode64();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. But you want to also consider feeding the wire type and the field number into the hasher. Otherwise the hash code may not be as dispersed as we'd like.

/**
* Interface for objects that can be hashed to a 64-bit long value.
*/
public interface SixtyFourBitHashable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I have no strong opinion on this name and the current one does look logical, but HashCode64 might be another option too as it's basically equal to the very method name introduced here.

* A test to evaluate the quality of non-cryptographic hash functions by checking how many unique hashes can be
* generated from 4.5 billion StateKey inputs.
*/
public final class PbjObjHashQualityStateKeyTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thank you! And what are the results? It'd be interesting to compare the old one and the new one, if possible.

@@ -0,0 +1,230 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: can we haz Java? :)

… hashing Strings

Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
Signed-off-by: Jasper Potts <1466205+jasperpotts@users.noreply.github.com>
Base automatically changed from add-xxh3-hashing to main August 19, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants