D-MemFS

An in-process virtual filesystem with hard quota enforcement for Python.

Proven Quality

Metric	Details
🧪 Robustness	369 tests with 97% code coverage
🔒 Verified Safety	98, 100×4 — top scores across all security categories (Socket.dev)
🌟 Community	Discussed on `r/Python` with highly positive reception

Why MFS?

MemoryFileSystem gives you a fully isolated filesystem-like workspace inside a Python process.

Hard quota (MFSQuotaExceededError) to reject oversized writes before OOM
Memory Guard to detect physical RAM exhaustion before it causes OOM kills
Full filesystem semantics: Hierarchical directories and multi-file operations (import_tree, copy_tree, move)
File-level RW locking + global structure lock for thread-safe operations
Free-threaded Python compatible (PYTHON_GIL=0) — stress-tested under 50-thread contention
Async wrapper (AsyncMemoryFileSystem) powered by asyncio.to_thread
Zero runtime dependencies (standard library only)
No admin/root privileges required — works on locked-down CI runners, containers, and shared machines where OS-level RAM disks are not an option
369 tests, 97% coverage across 3 OS (Linux / Windows / macOS) × 3 Python versions (3.11–3.13, including free-threaded 3.13t)

This is useful when io.BytesIO is too primitive (single buffer), and OS-level RAM disks/tmpfs are impractical (permissions, container policy, Windows driver friction). Ideal for CI pipeline acceleration — eliminate disk I/O from test suites and data processing without any infrastructure changes.

Note on Architectural Boundary: This is strictly an in-process tool. External subprocesses (CLI tools) cannot access these files via standard OS paths. If your pipeline relies heavily on passing files to external binaries, an OS-level RAM disk (tmpfs) is the correct tool. D-MemFS shines when accelerating Python-native test suites or internal data pipelines.

Archive Extraction In-Memory

Extract large ZIP or TAR archives entirely in-memory to process their contents on the fly. Prevent disk wear (TBW) and eliminate the risk of leaving garbage files behind.

📝 Tutorial: examples/archive_extraction.md

CI/CD Pipelines & Test Debugging

Speed up your pipeline by running heavy file I/O tests entirely in memory. If a test fails, export the complete virtual filesystem state to a physical directory (export_tree) for easy post-mortem debugging.

📝 Tutorial: examples/ci_debug_export.md

High-Speed SQLite Test Fixtures

Eliminate disk I/O bottlenecks in your database test suites. Generate a master SQLite database state once, store it in D-MemFS, and load it instantly for each individual test. Ensure perfect test isolation with zero disk wear and zero cleanup.

📝 Tutorial: examples/sqlite_test_fixtures.md

Multi-threaded Data Staging (ETL)

Use D-MemFS as a volatile, high-speed staging area for ETL pipelines. It features built-in, thread-safe file locking, ensuring safe concurrent data processing.

📝 Tutorial: examples/etl_staging_multithread.md

Safe Large File Processing (Serverless/Sandboxed)

Process massive files chunk-by-chunk using our Memory Guard. Safely raise an exception before the host OS hits an Out-Of-Memory (OOM) crash, which is crucial for environments without OS-level RAM disks.

📝 Tutorial: examples/memory_guard_streaming.md

Installation

pip install D-MemFS

Requirements: Python 3.11+

Quick Start

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)

mfs.mkdir("/data")
with mfs.open("/data/hello.bin", "wb") as f:
    f.write(b"hello")

with mfs.open("/data/hello.bin", "rb") as f:
    print(f.read())  # b"hello"

print(mfs.listdir("/data"))
print(mfs.is_file("/data/hello.bin"))  # True

try:
    with mfs.open("/huge.bin", "wb") as f:
        f.write(bytes(512 * 1024 * 1024))
except MFSQuotaExceededError as e:
    print(e)

API Highlights

`MemoryFileSystem`

open(path, mode, *, preallocate=0, lock_timeout=None)
mkdir, remove, rmtree, rename, move, copy, copy_tree
listdir, exists, is_dir, is_file, walk, glob
stat, stats, get_size
export_as_bytesio, export_tree, iter_export_tree, import_tree

Constructor parameters:

max_quota (default 256 MiB): byte quota for file data
max_nodes (default None): optional cap on total node count (files + directories). Raises MFSNodeLimitExceededError when exceeded.
default_storage (default "auto"): storage backend for new files — "auto" / "sequential" / "random_access"
promotion_hard_limit (default None): byte threshold above which Sequential→RandomAccess auto-promotion is suppressed (None uses the built-in 512 MiB limit)
chunk_overhead_override (default None): override the per-chunk overhead estimate used for quota accounting
default_lock_timeout (default 30.0): default timeout in seconds for file-lock acquisition during open(). Use None to wait indefinitely.
memory_guard (default "none"): physical memory protection mode — "none" / "init" / "per_write"
memory_guard_action (default "warn"): action when the guard triggers — "warn" (ResourceWarning) / "raise" (MemoryError)
memory_guard_interval (default 1.0): minimum seconds between OS memory queries ("per_write" only)

Note: The BytesIO returned by export_as_bytesio() is outside quota management. Exporting large files may consume significant process memory beyond the configured quota limit.

Note — Quota and free-threaded Python: The per-chunk overhead estimate used for quota accounting is calibrated at import time via sys.getsizeof(). Free-threaded Python (3.13t, PYTHON_GIL=0) has larger object headers than the standard build, so CHUNK_OVERHEAD_ESTIMATE is higher (~117 bytes vs ~93 bytes on CPython 3.13). This means the same max_quota yields slightly less effective storage capacity on free-threaded builds, especially for workloads with many small files or small appends. This is not a bug — it reflects real memory consumption. To ensure consistent behaviour across builds, use chunk_overhead_override to pin the value, or inspect stats()["overhead_per_chunk_estimate"] at runtime.

Supported binary modes: rb, wb, ab, r+b, xb

Memory Guard

MFS enforces a logical quota, but that quota can still be configured larger than the currently available physical RAM. memory_guard provides an optional safety net.

from dmemfs import MemoryFileSystem

# Warn if max_quota exceeds available RAM
mfs = MemoryFileSystem(max_quota=8 * 1024**3, memory_guard="init")

# Raise MemoryError before writes when RAM is insufficient
mfs = MemoryFileSystem(
    max_quota=8 * 1024**3,
    memory_guard="per_write",
    memory_guard_action="raise",
)

Mode	Initialization	Each Write	Overhead
`"none"`	—	—	Zero
`"init"`	Check once	—	Negligible
`"per_write"`	Check once	Cached check	About 1 OS call/sec

When memory_guard_action="warn", the guard emits ResourceWarning and allows the operation to continue. When memory_guard_action="raise", the guard rejects the operation with MemoryError before the actual allocation path.

AsyncMemoryFileSystem accepts the same constructor parameters and forwards them to the synchronous implementation.

`MemoryFileHandle`

io.RawIOBase-compatible binary handle
read, write, seek, tell, truncate, flush, close
readinto
file-like capability checks: readable, writable, seekable

flush() is intentionally a no-op (compatibility API for file-like integrations).

`stat()` return (`MFSStatResult`)

size, created_at, modified_at, generation, is_dir

Supports both files and directories
For directories: size=0, generation=0, is_dir=True

Text Mode

D-MemFS natively operates in binary mode. For text I/O, use MFSTextHandle:

from dmemfs import MemoryFileSystem, MFSTextHandle

mfs = MemoryFileSystem()
mfs.mkdir("/data")

# Write text
with mfs.open("/data/hello.bin", "wb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    th.write("こんにちは世界\n")
    th.write("Hello, World!\n")

# Read text line by line
with mfs.open("/data/hello.bin", "rb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    for line in th:
        print(line, end="")

MFSTextHandle is a thin, bufferless wrapper. It encodes on write() and decodes on read() / readline(). read(size) counts characters, not bytes, so multibyte text can be read safely without splitting code points. Unlike io.TextIOWrapper, it introduces no buffering issues when used with MemoryFileHandle.

Async Usage

from dmemfs import AsyncMemoryFileSystem

async def run() -> None:
    mfs = AsyncMemoryFileSystem(max_quota=64 * 1024 * 1024)
    await mfs.mkdir("/a")
    async with await mfs.open("/a/f.bin", "wb") as f:
        await f.write(b"data")
    async with await mfs.open("/a/f.bin", "rb") as f:
        print(await f.read())

Concurrency and Locking Notes

Path/tree operations are guarded by _global_lock.
File access is guarded by per-file ReadWriteLock.
lock_timeout behavior:
- None: block indefinitely
- 0.0: try-lock (fail immediately with BlockingIOError)
- > 0: timeout in seconds, then BlockingIOError
Current ReadWriteLock is non-fair: under sustained read load, writers can starve.

Operational guidance:

Keep lock hold duration short
Set an explicit lock_timeout in latency-sensitive code paths
walk() and glob() provide weak consistency: each directory level is snapshotted under _global_lock, but the overall traversal is NOT atomic. Concurrent structural changes may produce inconsistent results.

Benchmarks

Minimal benchmark tooling is included:

D-MemFS vs io.BytesIO vs PyFilesystem2 (MemoryFS) vs tempfile(RAMDisk) / tempfile(SSD)
Cases: many-small-files, stream write/read, random access, large stream, deep tree
Optional report output to benchmarks/results/

Note: As of setuptools 82 (February 2026), pyfilesystem2 fails to import due to a known upstream issue (#597). Benchmark results including PyFilesystem2 were measured with setuptools ≤ 81 and are valid as historical comparison data.

Run:

# With explicit RAM disk and SSD directories for tempfile comparison:
uvx --with-requirements requirements.txt --with-editable . python benchmarks/compare_backends.py --ramdisk-dir R:\Temp --ssd-dir C:\TempX --save-md auto --save-json auto

See BENCHMARK.md for details.

Latest benchmark snapshot:

benchmark_current_result.md

Testing and Coverage

Test execution and dev flow are documented in TESTING.md.

Typical local run:

uv pip compile requirements.in -o requirements.txt
uvx --with-requirements requirements.txt --with-editable . pytest tests/ -v --timeout=30 --cov=dmemfs --cov-report=xml --cov-report=term-missing

CI (.github/workflows/test.yml) runs tests with coverage XML generation.

API Docs Generation

API docs can be generated as Markdown (viewable on GitHub) using pydoc-markdown:

uvx --with pydoc-markdown --with-editable . pydoc-markdown '{
  loaders: [{type: python, search_path: [.]}],
  processors: [{type: filter, expression: "default()"}],
  renderer: {type: markdown, filename: docs/api_md/index.md}
}'

Or as HTML using pdoc (local browsing only):

uvx --with-requirements requirements.txt pdoc dmemfs -o docs/api

API Reference (Markdown)

Compatibility and Non-Goals

Core open() is binary-only (rb, wb, ab, r+b, xb). Text I/O is available via the MFSTextHandle wrapper.
No symlink/hardlink support — intentionally omitted to eliminate path traversal loops and structural complexity (same rationale as pathlib.PurePath).
No direct pathlib.Path / os.PathLike API — MFS paths are virtual and must not be confused with host filesystem paths. Accepting os.PathLike would allow third-party libraries or a plain open() call to silently treat an MFS virtual path as a real OS path, potentially issuing unintended syscalls against the host filesystem. All paths must be plain str with POSIX-style absolute notation (e.g. "/data/file.txt").
No kernel filesystem integration (intentionally in-process only)

Auto-promotion behavior:

By default (default_storage="auto"), new files start as SequentialMemoryFile and auto-promote to RandomAccessMemoryFile when random writes are detected.
Promotion is one-way (no downgrade back to sequential).
Use default_storage="sequential" or "random_access" to fix the backend at construction; use promotion_hard_limit to suppress auto-promotion above a byte threshold.
Storage promotion temporarily doubles memory usage for the promoted file. The quota system accounts for this, but process-level memory may spike briefly.

Security note: In-memory data may be written to physical disk via OS swap or core dumps. MFS does not provide memory-locking (e.g., mlock) or secure erasure. Do not rely on MFS alone for sensitive data isolation.

Exception Reference

Exception	Typical cause
`MFSQuotaExceededError`	write/import/copy would exceed quota
`MFSNodeLimitExceededError`	node count would exceed `max_nodes` (subclass of `MFSQuotaExceededError`)
`FileNotFoundError`	path missing
`FileExistsError`	creation target already exists
`IsADirectoryError`	file operation on directory
`NotADirectoryError`	directory operation on file
`BlockingIOError`	lock timeout or open-file conflict
`io.UnsupportedOperation`	mode mismatch / unsupported operation
`ValueError`	invalid mode/path/seek/truncate arguments

Testing with pytest

D-MemFS ships a pytest plugin that provides an mfs fixture:

# conftest.py — register the plugin explicitly
pytest_plugins = ["dmemfs._pytest_plugin"]

Note: The plugin is not auto-discovered. Users must declare it in conftest.py to opt in.

# test_example.py
def test_write_read(mfs):
    mfs.mkdir("/tmp")
    with mfs.open("/tmp/hello.txt", "wb") as f:
        f.write(b"hello")
    with mfs.open("/tmp/hello.txt", "rb") as f:
        assert f.read() == b"hello"

Development Notes

Design documents (Japanese):

Architecture Spec v13 — API design, internal structure, CI matrix
Architecture Spec v14 — MemoryGuard-integrated architecture spec
Detailed Design Spec v2 — component-level design and rationale
Test Design Spec v2 — test case table and pseudocode

These documents are written in Japanese and serve as internal design references.

Performance Summary

Key results from the included benchmark (300 small files × 4 KiB, 16 MiB stream, 512 MiB large stream):

Case	D-MemFS (ms)	BytesIO (ms)	tempfile(RAMDisk) (ms)	tempfile(SSD) (ms)
small_files_rw	51	6	207	267
stream_write_read	81	62	20	21
random_access_rw	34	82	37	35
large_stream_write_read	529	2 258	514	541
many_files_random_read	1 280	212	6 310	8 601
deep_tree_read	224	3	346	361

D-MemFS incurs a small overhead on tiny-file workloads but delivers significantly better performance on large streams and random-access patterns compared with BytesIO. See BENCHMARK.md and benchmark_current_result.md for full data.

Note: tempfile(RAMDisk) results were measured with the temp directory on a RAM disk; tempfile(SSD) results use a physical SSD. Use --ramdisk-dir and --ssd-dir options to reproduce both variants in a single run.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
dmemfs		dmemfs
docs		docs
examples		examples
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_ja.md		README_ja.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
TESTING.md		TESTING.md
TESTING_ja.md		TESTING_ja.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D-MemFS

Proven Quality

Why MFS?

Archive Extraction In-Memory

CI/CD Pipelines & Test Debugging

High-Speed SQLite Test Fixtures

Multi-threaded Data Staging (ETL)

Safe Large File Processing (Serverless/Sandboxed)

Installation

Quick Start

API Highlights

`MemoryFileSystem`

Memory Guard

`MemoryFileHandle`

`stat()` return (`MFSStatResult`)

Text Mode

Async Usage

Concurrency and Locking Notes

Benchmarks

Testing and Coverage

API Docs Generation

Compatibility and Non-Goals

Exception Reference

Testing with pytest

Development Notes

Performance Summary

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

D-MemFS

Proven Quality

Why MFS?

Archive Extraction In-Memory

CI/CD Pipelines & Test Debugging

High-Speed SQLite Test Fixtures

Multi-threaded Data Staging (ETL)

Safe Large File Processing (Serverless/Sandboxed)

Installation

Quick Start

API Highlights

MemoryFileSystem

Memory Guard

MemoryFileHandle

stat() return (MFSStatResult)

Text Mode

Async Usage

Concurrency and Locking Notes

Benchmarks

Testing and Coverage

API Docs Generation

Compatibility and Non-Goals

Exception Reference

Testing with pytest

Development Notes

Performance Summary

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`MemoryFileSystem`

`MemoryFileHandle`

`stat()` return (`MFSStatResult`)

Packages