Skip to content

bluuewhale/hash-smith

Repository files navigation

HashSmith: Fast & memory efficient hash tables for Java

License: MIT Maven Central javadoc Status: experimental

⚠️ This project is experimental and not ready for production use.

HashSmith logo

Overview

  • HashSmith provides multiple high-performance hash table implementations optimized for speed and memory efficiency on modern JVMs.
  • Focus areas: SWAR-probing (SwissMap), SIMD-probing (SwissSimdMap), predictable probe lengths (Robin Hood), and minimal per-entry overhead.
  • Built for JDK 21+; SwissSimdMap uses the incubating Vector API for SIMD acceleration.
  • More memory-efficient than the built-in JDK HashMap; performance depends on workload.

Implementations

  • SwissMap: SwissTable-inspired design using SWAR control-byte probing (no Vector API) with tombstone reuse. Default map.
  • SwissSimdMap: SIMD (Vector API incubator) variant of SwissMap with vectorized control-byte probing. See docs/SwissSimdMap.md for details.
  • ConcurrentSwissMap: sharded, thread-safe wrapper around SwissMap using per-shard StampedLock (null keys not supported).
  • SwissSet: SwissTable-style hash set with SIMD control-byte probing, tombstone reuse, and null-element support

Why SWAR by default?

Vector API is still incubating, and profiling on my setup showed the SIMD path taking longer than expected, so the default SwissMap favors a SWAR probe. Numbers can differ significantly by hardware/JVM version; please run your own benchmarks if you plan to use SwissSimdMap.

Blog / Write-up

  • If you want a guided tour with design notes and benchmarks, see this write-up.

Quick Start

import io.github.bluuewhale.hashsmith.SwissMap;      // SWAR
import io.github.bluuewhale.hashsmith.SwissSimdMap;  // Vector API
import io.github.bluuewhale.hashsmith.ConcurrentSwissMap;
import io.github.bluuewhale.hashsmith.RobinHoodMap;
import io.github.bluuewhale.hashsmith.SwissSet;

public class Demo {
    public static void main(String[] args) {
        // SwissMap (SWAR)
        var swiss = new SwissMap<String, Integer>();
        swiss.put("a", 1);
        swiss.put("b", 2);
        System.out.println(swiss.get("a")); // 1

        // SwissSimdMap (Vector API incubator)
        var swissSimd = new SwissSimdMap<String, Integer>();
        swissSimd.put("a", 1);
        swissSimd.put("b", 2);
        System.out.println(swissSimd.get("a")); // 1

        // ConcurrentSwissMap (sharded, thread-safe)
        var concurrentSwiss = new ConcurrentSwissMap<String, Integer>();
        concurrentSwiss.put("a", 1);
        concurrentSwiss.put("b", 2);
        System.out.println(concurrentSwiss.get("a")); // 1

        // SwissSet
        var swissSet = new SwissSet<String>();
        swissSet.add("k");
        swissSet.add(null); // nulls allowed
        System.out.println(swissSet.contains("k")); // true
    }
}

Install

  • Gradle (Kotlin DSL):
dependencies {
    implementation("io.github.bluuewhale:hashsmith:0.1.7")
}
  • Gradle (Groovy):
dependencies {
    implementation 'io.github.bluuewhale:hashsmith:0.1.7'
}
  • Maven:
<dependency>
  <groupId>io.github.bluuewhale</groupId>
  <artifactId>hashsmith</artifactId>
  <version>0.1.7</version>
</dependency>

Requirements

  • JDK 21+ (SwissSimdMap needs jdk.incubator.vector)
  • Gradle (wrapper provided)
  • The JVM flag --add-modules jdk.incubator.vector is already configured for build, test, and JMH tasks that exercise SwissSimdMap.

Build & Test

./gradlew build        # full build
./gradlew test         # JUnit 5 tests

Memory Footprint

  • Compares retained heap for both maps (HashMap vs SwissSimdMap vs SwissMap vs fastutil Object2ObjectOpenHashMap vs Eclipse Collections UnifiedMap) and sets (HashSet vs SwissSet vs fastutil ObjectOpenHashSet vs Eclipse Collections UnifiedSet).
  • Set benchmarks use UUID String keys (HashSet, SwissSet, ObjectOpenHashSet, UnifiedSet). Primitive-specialized collections (e.g., fastutil primitive sets) are excluded because their memory profile is driven by primitive storage, whereas these tests target general reference workloads.

Results

  • Maps: SwissMap/SwissSimdMap use open addressing to cut space; default load factor 0.875, up to 53.3% retained-heap reduction in payload-light cases vs HashMap.
  • Sets: SwissSet (SwissHashSet) mirrors the SwissTable layout with SIMD control-byte probing and reuses tombstones to stay denser than HashSet across tested payloads, showing up to ~62% retained-heap reduction in lighter payload cases.
Map Set
HashMap Memory Footprint HashSet Memory Footprint

Benchmark (JMH, CPU ns/op)

  • All benchmarks were run on Windows 11 (x64) with Eclipse Temurin JDK 21.0.9, on an AMD Ryzen 5 5600 (6C/12T).
  • At high load factors SwissMap (SWAR) stay competitive against other open-addressing tables and close to JDK HashMap performance; depending on hardware/JVM, one may edge out the other.
put hit put miss
CPU: put hit CPU: put miss
get hit get miss
CPU: get hit CPU: get miss

Contributing

  1. Open an issue for bugs/ideas
  2. Work on a feature branch and open a PR
  3. Keep tests/JMH green before submitting

License

  • This project is licensed under the MIT License. See LICENSE for details.