Install using:
pip install py-sharedmemory
Python's standard multiprocessing.Queue relies on _winapi.CreateFile for inter-process communication (IPC), introducing significant I/O overhead. This can become a performance bottleneck in demanding applications like Reinforcement Learning, scientific computing, or distributed systems that transfer large amounts of data (e.g. NumPy arrays or tensors) between processes (actors, replay buffers, trainers, etc.).
py-sharedmemory provides an alternative that utilizes multiprocessing.shared_memory (and therefore _winapi.CreateFileMapping) for the main data and sends only lightweight metadata through the queues. This eliminates most inter-process I/O, reducing system load and latency. If you're hitting performance limits with standard queues, py-sharedmemory may help.
import multiprocessing as mp
from memory import create_shared_memory_pair, SharedMemorySender, SharedMemoryReceiver
def producer_sm(sender:SharedMemorySender):
    your_data = "your data"
    sender.put(your_data) # blocks until space is available
    sender.put(your_data, timeout=3) # raises queue.Full exception after 3s
    sender.put(your_data, block=False) # raises queue.Full exception if no space available
    sender.put_nowait(your_data) # ^ equivalent to above
    # ...
    # wait for all data to be received before closing the sender
    # to properly close all shared memory objects
    sender.wait_for_all_ack()
def consumer_sm(receiver:SharedMemoryReceiver):
    data = receiver.get() # blocks
    data = receiver.get(timeout=3) # raises queue.Empty exception after 3s
    data = receiver.get(block=False) # raises queue.Empty exception if no data available
    data = receiver.get_nowait() # ^ equivalent to above
    # ...
if __name__ == '__main__':
    sender, receiver = create_shared_memory_pair(capacity=5)
    mp.Process(target=producer_sm, args=(sender,)).start()
    mp.Process(target=consumer_sm, args=(receiver,)).start()There is a certain overhead to allocating shared memory which is especially noticable for smaller objects. Use the following heuristic depending on the size of the data you are handling:
| 10B | 100B | 1KB | 10KB | 100KB | 1MB | 10MB | 100MB | 1GB | 10GB | |
|---|---|---|---|---|---|---|---|---|---|---|
| mp.Queue() | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | 
| py-sharedmemory | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | 
I benchmarked data transfer performance using both the standard multiprocessing.Queue and my py-sharedmemory implementation:
Starting around 1MB per message, py-sharedmemory matches or slightly trails the standard queue in speed. However, the key advantage is that it avoids generating I/O, which becomes critical at larger data sizes. Notably, the standard implementation fails with 10GB messages, while py-sharedmemory handles them reliably.
Here’s the I/O load on my Windows system using the standard queue:
And here’s the I/O load using py-sharedmemory:
In practice, py-sharedmemory delivers smoother and more stable performance, with consistent put/get times and no slowdowns, especially under high data throughput.


