High-performance Bloom Filter library for Elixir, powered by Nx
Bloomy provides probabilistic data structures for efficient set membership testing with minimal memory usage. Built on Nx tensors for blazing-fast performance with optional EXLA/GPU acceleration.
- 🚀 High Performance - Nx tensor operations with EXLA/GPU acceleration
- 🎯 Multiple Types - Standard, Counting (with deletion), Scalable (auto-grows), Learned (ML-enhanced)
- 💾 Persistence - Serialization with optional compression
- 🔀 Merge Operations - Union/intersection for distributed systems
- 📊 Rich Statistics - Comprehensive monitoring and metrics
- 🎓 Well Documented - Extensive docs and examples
Add bloomy to your dependencies in mix.exs:
def deps do
[
{:bloomy, "~> 0.1.0"},
{:nx, "~> 0.10.0"},
{:exla, "~> 0.10.0"} # Optional but recommended for performance
]
end# Create a bloom filter for 10,000 items with 1% false positive rate
filter = Bloomy.new(10_000, false_positive_rate: 0.01)
# Add items
filter = filter
|> Bloomy.add("user@example.com")
|> Bloomy.add("alice@example.com")
|> Bloomy.add("bob@example.com")
# Check membership
Bloomy.member?(filter, "user@example.com") # => true
Bloomy.member?(filter, "other@example.com") # => false
# Get statistics
Bloomy.info(filter)
# => %{
# type: :standard,
# capacity: 10000,
# items_count: 3,
# fill_ratio: 0.0021,
# false_positive_rate: 0.01,
# ...
# }Classic space-efficient probabilistic set membership test.
filter = Bloomy.new(10_000)
filter = Bloomy.add(filter, "apple")
Bloomy.member?(filter, "apple") # => trueSupports deletion operations using counters instead of bits.
filter = Bloomy.new(10_000, type: :counting)
filter = Bloomy.add(filter, "item")
Bloomy.member?(filter, "item") # => true
filter = Bloomy.remove(filter, "item")
Bloomy.member?(filter, "item") # => falseAutomatically grows to maintain target false positive rate.
filter = Bloomy.new(1_000, type: :scalable)
# Add millions of items - it will automatically scale!
filter = Enum.reduce(1..1_000_000, filter, fn i, f ->
Bloomy.add(f, "item_#{i}")
end)
info = Bloomy.info(filter)
info.slices_count # => Multiple slices created automaticallyML-enhanced filter with lower false positive rates.
filter = Bloomy.new(10_000, type: :learned)
# Train with positive and negative examples
training_data = %{
positive: ["valid_user_1", "valid_user_2", "valid_user_3"],
negative: ["spam_1", "spam_2", "spam_3"]
}
filter = Bloomy.train(filter, training_data)
filter = Bloomy.add(filter, "valid_user_1")
Bloomy.member?(filter, "valid_user_1") # => true (uses ML model + backup filter)EXLA provides significant performance improvements (2-20x faster) by:
- JIT compilation of numerical operations
- CPU/GPU acceleration
- Optimized tensor operations
In your config/config.exs:
# Use EXLA with CPU
config :nx, default_backend: EXLA.Backend
# Or use EXLA with GPU (if CUDA is available)
config :nx, default_backend: {EXLA.Backend, client: :cuda}
# Or use EXLA with specific device
config :nx, default_backend: {EXLA.Backend, client: :cuda, device_id: 0}Specify backend when creating filters:
# Use EXLA backend for this filter
filter = Bloomy.new(100_000, backend: EXLA.Backend)
# Use BinaryBackend (default, no compilation)
filter = Bloomy.new(100_000, backend: Nx.BinaryBackend)# Set backend at runtime
Nx.default_backend(EXLA.Backend)
# All filters created after this will use EXLA
filter = Bloomy.new(100_000)To use GPU acceleration (requires CUDA):
# In config/config.exs
config :nx, default_backend: {EXLA.Backend, client: :cuda}
# Or at runtime
Nx.default_backend({EXLA.Backend, client: :cuda})
# Create filter - will automatically use GPU
filter = Bloomy.new(1_000_000)Run the backend comparison benchmark:
mix run benchmarks/backend_comparison.exsTypical Performance Gains:
- Small filters (<10K): ~1.5x faster
- Medium filters (10K-100K): 2-5x faster
- Large filters (>100K): 5-10x faster
- Batch operations: 10-20x faster
- GPU (large filters): 20-50x faster
# Add multiple items efficiently
filter = Bloomy.add_all(filter, ["apple", "banana", "orange"])
# Batch membership testing
results = Bloomy.batch_member?(filter, ["apple", "grape", "banana"])
# => %{"apple" => true, "grape" => false, "banana" => true}
# Create from list
filter = Bloomy.from_list(["a", "b", "c", "d"])# Union (merge) filters
f1 = Bloomy.new(10_000) |> Bloomy.add("apple")
f2 = Bloomy.new(10_000) |> Bloomy.add("banana")
merged = Bloomy.union(f1, f2)
Bloomy.member?(merged, "apple") # => true
Bloomy.member?(merged, "banana") # => true
# Union multiple filters
filters = [filter1, filter2, filter3]
merged = Bloomy.union_all(filters)
# Intersection
intersected = Bloomy.intersect_all([filter1, filter2])# Calculate Jaccard similarity
similarity = Bloomy.jaccard_similarity(filter1, filter2)
# => 0.75 (75% similar)
# Calculate overlap coefficient
overlap = Bloomy.Operations.overlap_coefficient(filter1, filter2)# Serialize to binary
binary = Bloomy.to_binary(filter)
# With compression (recommended for large filters)
binary = Bloomy.to_binary(filter, compress: true)
# Deserialize
{:ok, loaded_filter} = Bloomy.from_binary(binary)# Save to file
:ok = Bloomy.save(filter, "my_filter.bloom")
# Save with compression
:ok = Bloomy.save(filter, "my_filter.bloom", compress: true)
# Load from file
{:ok, filter} = Bloomy.load("my_filter.bloom")Run the included benchmarks to see performance on your system:
# Basic operations (add, member?, batch operations)
mix run benchmarks/basic_operations.exs
# Compare filter types (Standard, Counting, Scalable)
mix run benchmarks/filter_comparison.exs
# Merge operations (union, intersection, similarity)
mix run benchmarks/operations.exs
# Serialization and persistence
mix run benchmarks/serialization.exs
# Backend comparison (BinaryBackend vs EXLA)
mix run benchmarks/backend_comparison.exsBloomy.new(capacity, opts)Common Options:
:type- Filter type::standard,:counting,:scalable,:learned(default::standard):false_positive_rate- Target false positive rate (default:0.01or 1%):backend- Nx backend:Nx.BinaryBackend,EXLA.Backend(default:Nx.default_backend())
Counting Filter Options:
:counter_width- Bits per counter:8,16,32(default:8)
Scalable Filter Options:
:growth_factor- Capacity multiplier for new slices (default:2):error_tightening_ratio- Error rate multiplier (default:0.8)
Learned Filter Options:
:confidence_threshold- Model confidence threshold (default:0.7)
# Standard with custom false positive rate
filter = Bloomy.new(10_000, false_positive_rate: 0.001)
# Counting with 16-bit counters and EXLA
filter = Bloomy.new(10_000,
type: :counting,
counter_width: 16,
backend: EXLA.Backend
)
# Scalable with custom growth
filter = Bloomy.new(1_000,
type: :scalable,
growth_factor: 3,
error_tightening_ratio: 0.5
)# Track active sessions
defmodule SessionTracker do
use GenServer
def start_link(_) do
GenServer.start_link(__MODULE__, nil, name: __MODULE__)
end
def init(_) do
# 1M sessions, 0.1% false positive rate, use EXLA for speed
filter = Bloomy.new(1_000_000,
false_positive_rate: 0.001,
backend: EXLA.Backend
)
{:ok, filter}
end
def track_session(session_id) do
GenServer.cast(__MODULE__, {:track, session_id})
end
def active?(session_id) do
GenServer.call(__MODULE__, {:check, session_id})
end
def handle_cast({:track, session_id}, filter) do
{:noreply, Bloomy.add(filter, session_id)}
end
def handle_call({:check, session_id}, _from, filter) do
{:reply, Bloomy.member?(filter, session_id), filter}
end
end# Track which items have been processed across nodes
defmodule DistributedCache do
def merge_caches(node_filters) do
# Collect filters from all nodes
filters = Enum.map(node_filters, & &1)
# Merge into single filter
merged = Bloomy.union_all(filters)
# Distribute back to all nodes
distribute_filter(merged)
end
def sync_filter(local_filter) do
# Save filter to shared storage
Bloomy.save(local_filter, "/shared/cache.bloom", compress: true)
end
def load_filter do
case Bloomy.load("/shared/cache.bloom") do
{:ok, filter} -> filter
{:error, _} -> Bloomy.new(1_000_000)
end
end
enddefmodule Deduplicator do
def process_stream(stream) do
initial_filter = Bloomy.new(10_000_000, type: :scalable)
Stream.transform(stream, initial_filter, fn item, filter ->
if Bloomy.member?(filter, item.id) do
# Skip duplicate
{[], filter}
else
# New item, add to filter and pass through
{[item], Bloomy.add(filter, item.id)}
end
end)
end
end-
Use EXLA Backend - Significant performance improvement (2-20x faster)
config :nx, default_backend: EXLA.Backend
-
Choose Appropriate Capacity - Overestimate slightly for better performance
# If expecting ~10K items, use 15K capacity filter = Bloomy.new(15_000)
-
Batch Operations - Use
add_allandbatch_member?for multiple items# Fast: single batch operation filter = Bloomy.add_all(filter, items) # Slower: many individual operations filter = Enum.reduce(items, filter, &Bloomy.add(&2, &1))
-
Use Scalable for Unknown Sizes - Automatically adapts to data volume
filter = Bloomy.new(1_000, type: :scalable)
-
Serialize with Compression - Reduces storage/network overhead
Bloomy.save(filter, "cache.bloom", compress: true)
A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set:
- False positives are possible (item might be in set)
- False negatives are impossible (if not found, definitely not in set)
- Uses multiple hash functions to set/check bits in an array
- Optimal bit array size:
m = -(n * ln(p)) / (ln(2)^2) - Optimal hash functions:
k = (m / n) * ln(2) - False positive rate:
p = (1 - e^(-kn/m))^k
Where:
n= number of itemsp= false positive ratem= bit array sizek= number of hash functions
- Hash Functions: MurmurHash3 + FNV-1a with double hashing technique
- Storage: Nx tensors (
:u8for bits,:u8/:u16/:u32for counters) - Acceleration: EXLA JIT compilation for vectorized operations
- ML: Simple neural network for learned filters
new/2- Create new bloom filteradd/2- Add item to filteradd_all/2- Add multiple itemsmember?/2- Check if item might be in setremove/2- Remove item (counting filters only)clear/1- Reset filter to empty stateinfo/1- Get statistics and metadata
union/2,union_all/1- Merge filtersintersect_all/1- Intersect filtersbatch_member?/2- Test multiple itemsfrom_list/2- Create filter from listjaccard_similarity/2- Calculate similarity
save/3- Save to fileload/2- Load from fileto_binary/2- Serialize to binaryfrom_binary/2- Deserialize from binary
train/3- Train model on examples
Run the test suite:
mix testRun the integration tests:
mix run test_bloomy.exsMIT License
- Built with Nx - Numerical computing library
- Uses EXLA - Accelerated Linear Algebra
- ML features powered by Scholar
- Inspired by classical bloom filter research and "The Case for Learned Index Structures"
- Bloom Filter on Wikipedia
- Nx Documentation
- EXLA Documentation
- Space/Time Trade-offs in Hash Coding with Allowable Errors - Original paper
Made with ❤️ and Elixir