A very performant, fairly lightweight HashSet implementation in C.
Note
This HashSet implementation solely relies on a "Open Adressing" implementation. A "Array of Linked Lists" implementation and "Treeifying" a node after a certain threshold yielded a negative performance impact from the extensive testing needed. Therefore, the current implementation uses "Open Adressing". This HashSet implementation does all memory management and cleanup internally.
This document provides an overview and detailed description of the functions available in this HashSet implementation.
This HashSet implementation provides a flexible hash set with functions to create, manipulate, and destroy hash sets. It supports both default and custom hash functions.
typedef struct HashSet *HashSet;
typedef size_t (*hash)(const void *key, size_t key_size);HashSet hs_create(size_t hs_capacity, size_t key_size, hash hash_func);
int hs_destroy(HashSet hs);
bool hs_contains(HashSet hs, const void *key);
int hs_add(HashSet hs, const void *key);
size_t hs_size(HashSet hs);
int hs_remove(HashSet hs, const void *key);HashSet hs_create(size_t hs_capacity, size_t key_size, hash hash_func)Description: Creates a new HashSet with a hash function.
Tip
Recommended way to create a HashSet is to use a custom hash function to avoid unwanted collisions.
-
Parameters:
hs_capacity: The initial capacity of theHashSet.key_size: The size of the key.hash_func: The hash function be be used, orNULLfor generic hashing.
-
Returns: A new
HashSet.
int hs_destroy(HashSet hs);Description: Destroys the HashSet.
-
Parameters:
hs: TheHashSetto be destroyed.
-
Returns: A success code.
bool hs_contains(HashSet hs, const void *key);Description: Tests if the HashSet contains the specified key.
-
Parameters:
hs: TheHashSet.key: The key to be tested.
-
Returns: True or Falsehood.
int hs_add(HashSet hs, const void *key);Description: Adds a new key to the HashSet.
-
Parameters:
hs: TheHashSet.key: The key to be added.
-
Returns: A sucess code.
size_t hs_size(HashSet hs);Description: Returns the size of the HashSet.
-
Parameters:
hs: TheHashSet.
-
Returns: The size.
int hs_remove(HashSet hs, const void *key);Description: Removes the key from the HashSet.
-
Parameters:
hs: TheHashSet.key: The key to be removed.
-
Returns: A success code.
0: Success.- Non-zero values indicate errors.
Warning
Micro benchmarks can be misleading, and performance can vary from system to system.
Note
All implementation used a struct/class containg a randomly generated string as the key. Same hashing function was used in each benchmark (DJB2). Each micro benchmark is was run 100 times and the average time for each operation was taken.
Note
Used gcc -std=c2x -Ofast hashset.h hashset.c main.c for compilation.
| Operation | Average Time (100,000 elements) |
|---|---|
| Insertion | 0.001872 seconds |
| Lookup | 0.001362 seconds |
| Deletion | 0.001498 seconds |
Note
Used g++ -Ofast main.cpp.
| Operation | Average Time (100,000 elements) |
|---|---|
| Insertion | 0.00908522 seconds |
| Lookup | 0.000315724 seconds |
| Deletion | 0.005854 seconds |
Note
Used OpenJDK 22.
| Operation | Average Time (100,000 elements) |
|---|---|
| Insertion | 0.006076 seconds |
| Lookup | 0.001066 seconds |
| Deletion | 0.001115 seconds |
Note
Used CPython 3.11.9.
| Operation | Average Time (100,000 elements) |
|---|---|
| Insertion | 0.007460 seconds |
| Lookup | 0.003630 seconds |
| Deletion | 0.003446 seconds |
Here is a basic example demonstrating how to use the HashSet functions:
#include <stdio.h>
#include "HashSet.h"
int main()
{
// Create a HashSet
HashSet hs = hs_create(10, sizeof(int), NULL);
// Add a key-value pair
int key = 1;
hs_add(hs, &key);
// Check if the HashSet contains the key
printf("%s\n", hs_contains(hs, &key) ? "true" : "false");
// Destroy the HashSet
hs_destroy(hs);
return 0;
}