A progressive guide to content-addressed storage backed by Git. Every section
builds on the same running example -- storing, managing, and restoring a photo
called vacation.jpg under the slug photos/vacation -- so you can follow
along from first principles to full mastery.
- What is git-cas?
- Quick Start
- Core Concepts
- Storing Files
- Restoring Files
- Encryption
- The CLI
- Lifecycle Management
- Observability
- Compression
- Passphrase Encryption (KDF) 11b. Multi-Recipient Encryption & Key Rotation
- Merkle Manifests
- Vault
- Architecture
- Codec System
- Error Handling
- FAQ / Troubleshooting
Git is, at its core, a content-addressed object database. Every object --
blob, tree, commit, tag -- is stored by the SHA-1 hash of its content. When
two files share the same bytes, Git stores them once. git-cas takes this
property seriously: it turns Git's object database into a general-purpose
content-addressed storage (CAS) system for arbitrary binary files.
The problem git-cas solves is straightforward. You have large binary assets
-- images, model weights, data packs, build artifacts, encrypted secret
bundles -- and you want to store them in a way that is deterministic,
deduplicated, integrity-verified, and committable. Git LFS solves this by
moving blobs to an external server, but that introduces a separate
infrastructure dependency and breaks the self-contained nature of a Git
repository. git-cas keeps everything inside Git's own object database.
The approach works as follows. A file is split into fixed-size chunks, each
chunk is written as a Git blob via git hash-object -w, and a manifest
(a small JSON or CBOR document listing every chunk's hash, size, and blob OID)
is written alongside them into a Git tree via git mktree. That tree OID can
then be committed, tagged, or referenced like any other Git object. Restoring
the file means reading the tree, parsing the manifest, fetching each blob,
verifying SHA-256 digests, and concatenating the bytes back together. Optional
AES-256-GCM encryption can be applied before chunking, so ciphertext is what
lands in the object database -- plaintext never touches disk or the ODB.
- Node.js >= 22.0.0 (Bun and Deno are also supported)
- A Git repository (bare or working tree)
npm install @git-stunts/git-cas @git-stunts/plumbingimport GitPlumbing from '@git-stunts/plumbing';
import ContentAddressableStore from '@git-stunts/git-cas';
// Point at a Git repository
const git = new GitPlumbing({ cwd: './my-repo' });
const cas = new ContentAddressableStore({ plumbing: git });
// Store vacation.jpg under the slug "photos/vacation"
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
});
console.log(manifest.slug); // "photos/vacation"
console.log(manifest.filename); // "vacation.jpg"
console.log(manifest.size); // total bytes stored
console.log(manifest.chunks.length); // number of chunks
// Create a Git tree from the manifest
const treeOid = await cas.createTree({ manifest });
console.log(treeOid); // e.g. "a1b2c3d4..."
// Restore the file later
await cas.restoreFile({ manifest, outputPath: './restored.jpg' });That is the full round-trip: store, tree, restore. The rest of this guide unpacks what happens at each step.
A slug is a logical identifier for your asset. It is a freeform, non-empty
string -- typically a path-like name such as photos/vacation or
models/v3-weights. The slug is stored inside the manifest and is how you
refer to the asset in your application logic. It does not affect where
data lives in Git's object database.
Large files are split into fixed-size pieces called chunks. Each chunk is stored as a Git blob. A chunk has four properties:
| Field | Type | Description |
|---|---|---|
index |
number | Zero-based position in the file |
size |
number | Byte length of this chunk |
digest |
string | SHA-256 hex digest of the chunk's raw bytes |
blob |
string | Git OID (the SHA-1 hash Git uses to store it) |
Because Git is itself content-addressed, if two chunks happen to contain identical bytes, Git stores them only once. This gives you deduplication for free.
A manifest is the index that ties everything together. After storing
vacation.jpg, the manifest looks like this:
{
"slug": "photos/vacation",
"filename": "vacation.jpg",
"size": 524288,
"chunks": [
{
"index": 0,
"size": 262144,
"digest": "e3b0c44298fc1c149afbf4c8996fb924...",
"blob": "a1b2c3d4e5f6..."
},
{
"index": 1,
"size": 262144,
"digest": "d7a8fbb307d7809469ca9abcb0082e4f...",
"blob": "f6e5d4c3b2a1..."
}
]
}Manifests are immutable value objects validated by a Zod schema at
construction time. If you try to create a Manifest with missing or
malformed fields, an error is thrown immediately.
When encryption is used, the manifest gains an additional encryption field:
{
"slug": "photos/vacation",
"filename": "vacation.jpg",
"size": 524288,
"chunks": [ ... ],
"encryption": {
"algorithm": "aes-256-gcm",
"nonce": "base64-encoded-nonce",
"tag": "base64-encoded-auth-tag",
"encrypted": true
}
}When you call createTree({ manifest }), git-cas serializes the manifest
using the configured codec (JSON by default), writes it as a blob, then
builds a Git tree that looks like this:
100644 blob <oid> manifest.json
100644 blob <oid> e3b0c44298fc1c149afbf4c8996fb924...
100644 blob <oid> d7a8fbb307d7809469ca9abcb0082e4f...
The tree contains one entry for the manifest file (named manifest.json or
manifest.cbor depending on the codec) and one entry per chunk, named by
its SHA-256 digest. This tree OID is a standard Git object -- you can commit
it, tag it, push it, or embed it in a larger tree.
The codec controls how the manifest is serialized before being written to
Git. Two codecs ship with git-cas:
- JsonCodec -- human-readable, produces
manifest.json. Default. - CborCodec -- compact binary format, produces
manifest.cbor. Smaller manifests.
Both implement the same CodecPort interface: encode(data), decode(buffer),
and get extension().
When you call cas.storeFile(), the following happens:
- The file at
filePathis opened as a readable stream. - The stream is consumed in chunks of
chunkSizebytes (default: 256 KiB). - Each chunk is SHA-256 hashed and written to Git as a blob via
git hash-object -w --stdin. - A manifest is assembled from the chunk metadata.
- The manifest is returned as a frozen
Manifestvalue object.
The default chunk size is 256 KiB (262,144 bytes). You can change it at construction time. The minimum is 1,024 bytes.
const cas = new ContentAddressableStore({
plumbing: git,
chunkSize: 1024 * 1024, // 1 MiB chunks
});Larger chunks mean fewer Git objects but coarser deduplication. Smaller chunks improve deduplication but increase object count and manifest size. For most use cases, the default is a good balance.
import GitPlumbing from '@git-stunts/plumbing';
import ContentAddressableStore from '@git-stunts/git-cas';
const git = new GitPlumbing({ cwd: './assets-repo' });
const cas = new ContentAddressableStore({ plumbing: git });
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
});
// Inspect the result
console.log(`Stored ${manifest.filename} (${manifest.size} bytes)`);
console.log(`Split into ${manifest.chunks.length} chunks`);
for (const chunk of manifest.chunks) {
console.log(` chunk[${chunk.index}]: ${chunk.size} bytes, blob ${chunk.blob}`);
}For a 500 KiB file with the default 256 KiB chunk size, you would see two chunks: the first at 262,144 bytes and the second at the remaining bytes.
If you already have data in memory or coming from a non-file source, use
store() directly instead of storeFile():
async function* generateData() {
yield Buffer.from('first batch of bytes...');
yield Buffer.from('second batch of bytes...');
}
const manifest = await cas.store({
source: generateData(),
slug: 'photos/vacation',
filename: 'vacation.jpg',
});Once you have the manifest, persist it as a Git tree:
const treeOid = await cas.createTree({ manifest });
console.log(`Tree OID: ${treeOid}`);
// You can now commit this tree:
// git commit-tree <treeOid> -m "Store vacation.jpg"Given a manifest, restoreFile() reads every chunk from Git, verifies each
chunk's SHA-256 digest, concatenates the buffers, and writes the result to
the specified output path.
await cas.restoreFile({
manifest,
outputPath: './restored-vacation.jpg',
});
// restored-vacation.jpg is now byte-identical to the originalIf you need the bytes in memory rather than on disk, use restore():
const { buffer, bytesWritten } = await cas.restore({ manifest });
console.log(`Restored ${bytesWritten} bytes into memory`);During restore, each chunk is re-hashed with SHA-256 and compared against the
digest recorded in the manifest. If any chunk has been corrupted or tampered
with, an INTEGRITY_ERROR is thrown immediately:
CasError: Chunk 0 integrity check failed
code: 'INTEGRITY_ERROR'
meta: { chunkIndex: 0, expected: '...', actual: '...' }
You can also verify integrity without restoring:
const isValid = await cas.verifyIntegrity(manifest);
if (isValid) {
console.log('All chunks intact');
} else {
console.log('Corruption detected');
}In many workflows you do not have the manifest object in memory -- you have a Git tree OID that was committed earlier. To restore, you need to read the tree, extract the manifest, and then restore from it:
const service = await cas.getService();
// Read the tree entries
const entries = await service.persistence.readTree(treeOid);
// Find the manifest entry (named manifest.json or manifest.cbor)
const manifestEntry = entries.find(e => e.name.startsWith('manifest.'));
const manifestBlob = await service.persistence.readBlob(manifestEntry.oid);
// Decode the manifest using the configured codec
import Manifest from '@git-stunts/git-cas/src/domain/value-objects/Manifest.js';
const manifest = new Manifest(service.codec.decode(manifestBlob));
// Restore the file
await cas.restoreFile({ manifest, outputPath: './restored-vacation.jpg' });The CLI (Section 7) handles this entire flow with a single command.
git-cas supports optional AES-256-GCM encryption. When enabled, the file
content is encrypted via a streaming cipher before chunking, so only
ciphertext is stored in Git's object database. Plaintext never touches the
ODB.
An encryption key must be exactly 32 bytes (256 bits). Generate one with OpenSSL:
openssl rand -out vacation.key 32Or in Node.js:
import { randomBytes } from 'node:crypto';
import { writeFileSync } from 'node:fs';
const key = randomBytes(32);
writeFileSync('./vacation.key', key);Pass the encryptionKey option when storing:
import { readFileSync } from 'node:fs';
const encryptionKey = readFileSync('./vacation.key');
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
encryptionKey,
});
console.log(manifest.encryption);
// {
// algorithm: 'aes-256-gcm',
// nonce: 'dGhpcyBpcyBhIG5vbmNl',
// tag: 'YXV0aGVudGljYXRpb24gdGFn',
// encrypted: true
// }The manifest now carries an encryption field containing the algorithm,
a base64-encoded nonce, a base64-encoded authentication tag, and a flag
indicating the content is encrypted. The nonce and tag are generated fresh
for every store operation.
To restore encrypted content, provide the same key:
await cas.restoreFile({
manifest,
encryptionKey,
outputPath: './decrypted-vacation.jpg',
});
// decrypted-vacation.jpg is byte-identical to the original vacation.jpgIf you attempt to restore with an incorrect key, AES-256-GCM's authenticated encryption detects the mismatch and throws:
CasError: Decryption failed: Integrity check error
code: 'INTEGRITY_ERROR'
If you attempt to restore encrypted content without providing any key at all:
CasError: Encryption key required to restore encrypted content
code: 'MISSING_KEY'
Keys must be a Buffer or Uint8Array of exactly 32 bytes. Violations
produce clear errors:
- Non-buffer key:
INVALID_KEY_TYPE - Wrong length:
INVALID_KEY_LENGTH(includes expected and actual lengths)
The full encrypted workflow, from store to tree to restore:
import { readFileSync } from 'node:fs';
import GitPlumbing from '@git-stunts/plumbing';
import ContentAddressableStore from '@git-stunts/git-cas';
const git = new GitPlumbing({ cwd: './assets-repo' });
const cas = new ContentAddressableStore({ plumbing: git });
const encryptionKey = readFileSync('./vacation.key');
// Store with encryption
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
encryptionKey,
});
// Persist as a Git tree
const treeOid = await cas.createTree({ manifest });
// Later: restore from tree OID (see Section 5 for readTree pattern)
// ...pass encryptionKey to restoreFile()git-cas installs as a Git subcommand. After installation, git cas is
available in any Git repository.
# Store vacation.jpg and print the manifest JSON
git cas store ./vacation.jpg --slug photos/vacationOutput (manifest JSON):
{
"slug": "photos/vacation",
"filename": "vacation.jpg",
"size": 524288,
"chunks": [
{
"index": 0,
"size": 262144,
"digest": "e3b0c44298fc1c149afbf4c8996fb924...",
"blob": "a1b2c3d4e5f6..."
},
{
"index": 1,
"size": 262144,
"digest": "d7a8fbb307d7809469ca9abcb0082e4f...",
"blob": "f6e5d4c3b2a1..."
}
]
}# The --tree flag creates a tree and prints its OID instead of the manifest
git cas store ./vacation.jpg --slug photos/vacation --tree
# Output: a1b2c3d4e5f67890...If you saved the manifest JSON to a file, you can create a tree from it later:
git cas store ./vacation.jpg --slug photos/vacation > manifest.json
git cas tree --manifest manifest.json
# Output: a1b2c3d4e5f67890...git cas restore a1b2c3d4e5f67890... --out ./restored-vacation.jpg
# Output: 524288 (bytes written)The restore command reads the tree, finds the manifest entry, decodes it,
reads and verifies all chunks, and writes the reassembled file.
# Generate a 32-byte key
openssl rand -out vacation.key 32
# Store with encryption, get a tree OID
git cas store ./vacation.jpg --slug photos/vacation --key-file ./vacation.key --tree
# Output: a1b2c3d4e5f67890...
# Restore with the same key
git cas restore a1b2c3d4e5f67890... --out ./decrypted-vacation.jpg --key-file ./vacation.key
# Output: 524288# Enable gzip compression
git cas store ./data.bin --slug my-data --tree --gzip
# Use CDC (content-defined chunking) for sub-file deduplication
git cas store ./data.bin --slug my-data --tree --strategy cdc
# Customize chunk size and enable parallel I/O
git cas store ./data.bin --slug my-data --tree --chunk-size 65536 --concurrency 4
# Use CBOR codec for smaller manifests
git cas store ./data.bin --slug my-data --tree --codec cbor
# CDC with custom parameters
git cas store ./data.bin --slug my-data --tree \
--strategy cdc --target-chunk-size 32768 \
--min-chunk-size 8192 --max-chunk-size 131072
# Restore with parallel I/O
git cas restore --slug my-data --out ./data.bin --concurrency 4Place a .casrc JSON file at the repository root to set defaults. CLI flags
always take precedence.
{
"chunkSize": 65536,
"strategy": "cdc",
"concurrency": 4,
"codec": "json",
"compression": "gzip",
"merkleThreshold": 500,
"cdc": {
"minChunkSize": 8192,
"targetChunkSize": 32768,
"maxChunkSize": 131072
}
}By default the CLI operates in the current directory. Use --cwd to point at
a different repository:
git cas store ./vacation.jpg --slug photos/vacation --cwd /path/to/assets-repo --treeGiven a tree OID (from a commit, tag, or ref), you can reconstruct the manifest object with a single call:
const manifest = await cas.readManifest({ treeOid });
console.log(manifest.slug); // "photos/vacation"
console.log(manifest.chunks); // array of Chunk objectsreadManifest reads the tree, locates the manifest entry (e.g.
manifest.json or manifest.cbor), decodes it using the configured codec,
and returns a frozen, Zod-validated Manifest. If no manifest entry is found,
it throws CasError('MANIFEST_NOT_FOUND').
Stored assets can be verified at any time without restoring them. This is useful for periodic integrity checks or auditing:
const ok = await cas.verifyIntegrity(manifest);
if (!ok) {
console.error(`Asset ${manifest.slug} has corrupted chunks`);
}The verifyIntegrity method reads each chunk blob from Git, recomputes its
SHA-256 digest, and compares it against the manifest. It emits either
integrity:pass or integrity:fail events (see Section 9).
inspectAsset returns logical deletion metadata for an asset without
performing any destructive Git operations. The caller is responsible for
removing refs and running git gc --prune to reclaim space:
const { slug, chunksOrphaned } = await cas.inspectAsset({ treeOid });
console.log(`Asset "${slug}" has ${chunksOrphaned} chunks to clean up`);
// Remove the ref pointing to the tree, then:
// git gc --prune=nowThis is intentionally non-destructive: CAS never modifies or deletes Git objects. It only tells you what would become unreachable.
Deprecation note:
deleteAsset()is a deprecated alias forinspectAsset(). It will be removed in a future major version.
When you store the same file multiple times with different chunk sizes, or
store overlapping files, some chunk blobs may no longer be referenced by any
manifest. collectReferencedChunks aggregates all referenced chunk blob OIDs
across multiple assets:
const { referenced, total } = await cas.collectReferencedChunks({
treeOids: [treeOid1, treeOid2, treeOid3]
});
console.log(`${referenced.size} unique blobs across ${total} total chunk references`);If any treeOid lacks a manifest, the call throws
CasError('MANIFEST_NOT_FOUND') (fail closed). This is analysis only — no
objects are deleted or modified.
Deprecation note:
findOrphanedChunks()is a deprecated alias forcollectReferencedChunks(). It will be removed in a future major version.
A common pattern is to store multiple assets and assemble their trees into a larger Git tree structure using standard Git plumbing:
const photoManifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
});
const photoTree = await cas.createTree({ manifest: photoManifest });
const videoManifest = await cas.storeFile({
filePath: './clip.mp4',
slug: 'videos/clip',
});
const videoTree = await cas.createTree({ manifest: videoManifest });
// Now photoTree and videoTree are standard Git tree OIDs
// You can compose them into a parent tree, commit them, etc.CasService extends EventEmitter. Every significant operation emits an
event you can listen to for progress tracking, logging, or monitoring.
| Event | Emitted When | Payload |
|---|---|---|
chunk:stored |
A chunk is written to Git | { index, size, digest, blob } |
chunk:restored |
A chunk is read back from Git | { index, size, digest } |
file:stored |
All chunks for a file have been stored | { slug, size, chunkCount, encrypted } |
file:restored |
A file has been fully restored | { slug, size, chunkCount } |
integrity:pass |
All chunks pass integrity verification | { slug } |
integrity:fail |
A chunk fails integrity verification | { slug, chunkIndex, expected, actual } |
error |
An error occurs (guarded) | { code, message } |
The error event is guarded: it is only emitted if there is at least one
listener attached. This prevents unhandled error event crashes from
EventEmitter.
const service = await cas.getService();
let chunksStored = 0;
service.on('chunk:stored', ({ index, size }) => {
chunksStored++;
console.log(` Stored chunk ${index} (${size} bytes)`);
});
service.on('file:stored', ({ slug, size, chunkCount }) => {
console.log(`Finished: ${slug} -- ${size} bytes in ${chunkCount} chunks`);
});
// Now store -- events fire as chunks are written
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
});service.on('chunk:restored', ({ index, size, digest }) => {
console.log(` Restored chunk ${index} (${size} bytes, digest: ${digest.slice(0, 8)}...)`);
});
service.on('file:restored', ({ slug, size, chunkCount }) => {
console.log(`Restored: ${slug} -- ${size} bytes from ${chunkCount} chunks`);
});
await cas.restoreFile({ manifest, outputPath: './restored-vacation.jpg' });service.on('error', ({ code, message }) => {
console.error(`[CAS ERROR] ${code}: ${message}`);
});service.on('integrity:pass', ({ slug }) => {
console.log(`Integrity OK: ${slug}`);
});
service.on('integrity:fail', ({ slug, chunkIndex, expected, actual }) => {
console.error(`CORRUPT: ${slug} chunk ${chunkIndex}`);
console.error(` expected: ${expected}`);
console.error(` actual: ${actual}`);
});
await cas.verifyIntegrity(manifest);New in v2.0.0.
git-cas supports optional gzip compression. When enabled, file content is
compressed before encryption (if any) and before chunking. This reduces storage
size for compressible data without changing the round-trip contract.
Pass the compression option when storing:
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
compression: { algorithm: 'gzip' },
});
console.log(manifest.compression);
// { algorithm: 'gzip' }The manifest gains an optional compression field recording the algorithm used.
Compression and encryption compose naturally. Compression runs first (on plaintext), then encryption runs on the compressed bytes:
const manifest = await cas.storeFile({
filePath: './data.csv',
slug: 'reports/q4',
compression: { algorithm: 'gzip' },
encryptionKey,
});Decompression on restore() is automatic. If the manifest includes a
compression field, the restored bytes are decompressed after decryption
(if encrypted) and after chunk reassembly:
await cas.restoreFile({
manifest,
outputPath: './restored.csv',
});
// restored.csv is byte-identical to the original data.csvCompression is most useful for text, CSV, JSON, XML, and other compressible formats. For already-compressed data (JPEG, PNG, MP4, ZIP), compression adds CPU cost without meaningful size reduction. Use your judgement.
New in v2.0.0.
Instead of managing raw 32-byte encryption keys, you can derive keys from
passphrases using standard key derivation functions (KDFs). git-cas supports
PBKDF2 (default) and scrypt.
Pass passphrase instead of encryptionKey:
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
passphrase: 'my secret passphrase',
});
console.log(manifest.encryption.kdf);
// {
// algorithm: 'pbkdf2',
// salt: 'base64-encoded-salt',
// iterations: 100000,
// keyLength: 32
// }KDF parameters (salt, iterations, algorithm) are stored in the manifest's
encryption.kdf field. The salt is generated randomly for each store
operation.
Provide the same passphrase on restore. The KDF parameters in the manifest are used to re-derive the key:
await cas.restoreFile({
manifest,
passphrase: 'my secret passphrase',
outputPath: './restored.jpg',
});A wrong passphrase produces a wrong key, which fails with INTEGRITY_ERROR
(AES-256-GCM detects it).
Pass kdfOptions to select scrypt:
const manifest = await cas.storeFile({
filePath: './secret.bin',
slug: 'vault',
passphrase: 'strong passphrase',
kdfOptions: { algorithm: 'scrypt', cost: 16384 },
});For advanced workflows, derive the key yourself:
const { key, salt, params } = await cas.deriveKey({
passphrase: 'my secret passphrase',
algorithm: 'pbkdf2',
iterations: 200000,
});
// Use the derived key directly
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
encryptionKey: key,
});| Algorithm | Default Params | Notes |
|---|---|---|
pbkdf2 (default) |
100,000 iterations, SHA-512 | Widely supported, good baseline |
scrypt |
N=16384, r=8, p=1 | Memory-hard, stronger against GPU attacks |
New in v5.1.0 (recipients), v5.2.0 (rotation).
Instead of encrypting with a single key, you can encrypt for multiple recipients. A random DEK encrypts the data; each recipient's KEK wraps the DEK:
const manifest = await cas.store({
source, slug: 'shared', filename: 'shared.bin',
recipients: [
{ label: 'alice', key: aliceKey },
{ label: 'bob', key: bobKey },
],
});Any recipient can restore independently:
const { buffer } = await cas.restore({ manifest, encryptionKey: bobKey });When a key is compromised, rotate it without re-encrypting data:
const rotated = await cas.rotateKey({
manifest, oldKey: aliceOldKey, newKey: aliceNewKey, label: 'alice',
});
// Persist the updated manifest
const treeOid = await cas.createTree({ manifest: rotated });The keyVersion counter increments with each rotation:
console.log(rotated.encryption.keyVersion); // 1
console.log(rotated.encryption.recipients[0].keyVersion); // 1Rotate the master passphrase for all vault entries at once:
const { commitOid, rotatedSlugs, skippedSlugs } = await cas.rotateVaultPassphrase({
oldPassphrase: 'old-secret', newPassphrase: 'new-secret',
});Non-envelope entries (direct-key encryption) are skipped — they require manual re-store.
# Rotate a single recipient's key
git cas rotate --slug shared --old-key-file old.key --new-key-file new.key --label alice
# Rotate vault passphrase
git cas vault rotate --old-passphrase old-secret --new-passphrase new-secretNew in v2.0.0.
When storing very large files, the manifest (which lists every chunk) can itself become large. Merkle manifests solve this by splitting the chunk list into sub-manifests, each stored as a separate Git blob. The root manifest references sub-manifests by OID.
When the chunk count exceeds merkleThreshold (default: 1000), git-cas
automatically:
- Groups chunks into sub-manifests (each containing up to
merkleThresholdchunks). - Stores each sub-manifest as a Git blob.
- Writes a root manifest with
version: 2and asubManifestsarray referencing the sub-manifest blob OIDs.
Set merkleThreshold at construction time:
const cas = new ContentAddressableStore({
plumbing: git,
merkleThreshold: 500, // Split at 500 chunks instead of 1000
});readManifest() transparently handles both v1 (flat) and v2 (Merkle)
manifests. When it encounters a v2 manifest, it reads all sub-manifests,
concatenates their chunk lists, and returns a flat Manifest object:
const manifest = await cas.readManifest({ treeOid });
// Works identically whether the manifest is v1 or v2
console.log(manifest.chunks.length); // Full chunk list, regardless of structure- v2 code reads v1 manifests without any changes.
- v1 manifests (chunk count below threshold) continue to use the flat format.
- The
versionfield defaults to1for existing manifests.
When you call createTree({ manifest }), the resulting tree is a loose Git
object. If nothing references it -- no commit, no tag, no ref -- git gc
will garbage-collect it. This can silently lose stored data.
The vault solves this by maintaining a single Git ref (refs/cas/vault)
pointing to a commit chain. The commit's tree indexes all stored assets by
slug. One ref protects everything from GC, and git log refs/cas/vault
gives you free history of every vault operation.
refs/cas/vault → commit → tree
├── 100644 blob <oid> .vault.json
├── 040000 tree <oid> photos/vacation
├── 040000 tree <oid> models/v3-weights
The .vault.json blob contains versioned metadata. Without encryption:
{ "version": 1 }. With encryption, it includes KDF configuration.
// Plain vault (no encryption)
await cas.initVault();
// Vault with passphrase-based encryption
await cas.initVault({
passphrase: 'my vault passphrase',
kdfOptions: { algorithm: 'pbkdf2' },
});When initialized with a passphrase, the vault generates a salt and stores
the KDF parameters in .vault.json. The passphrase itself is never stored.
// Store a file and add it to the vault
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
});
const treeOid = await cas.createTree({ manifest });
await cas.addToVault({ slug: 'photos/vacation', treeOid });If the vault does not exist yet, addToVault auto-initializes it with
{ version: 1 } metadata (no encryption). If the slug already exists, it
throws VAULT_ENTRY_EXISTS unless you pass force: true.
// List all entries (sorted by slug)
const entries = await cas.listVault();
for (const { slug, treeOid } of entries) {
console.log(`${slug}\t${treeOid}`);
}
// Resolve a slug to its tree OID
const treeOid = await cas.resolveVaultEntry({ slug: 'photos/vacation' });
const manifest = await cas.readManifest({ treeOid });const { removedTreeOid } = await cas.removeFromVault({ slug: 'photos/vacation' });After removing the last entry, the vault remains (with an empty tree +
.vault.json). The ref stays alive.
When a vault is initialized with a passphrase, the CLI handles key derivation automatically:
# Initialize an encrypted vault
git cas vault init --vault-passphrase "secret"
# Store with vault-level encryption (key derived from vault config)
git cas store ./vacation.jpg --slug photos/vacation --tree --vault-passphrase "secret"
# Restore using vault slug
git cas restore --slug photos/vacation --out ./restored.jpg --vault-passphrase "secret"The vault stores the KDF policy (algorithm, salt, iterations). The actual
encryption is still per-entry AES-256-GCM via the existing store()/restore()
paths -- the vault just provides the key-derivation policy.
# Initialize vault (optionally with encryption)
git cas vault init
git cas vault init --vault-passphrase "secret" --algorithm pbkdf2
# List all vault entries (tab-separated slug + tree OID)
git cas vault list
# Inspect a single entry
git cas vault info photos/vacation
# Remove an entry
git cas vault remove photos/vacation
# View vault commit history
git cas vault history
git cas vault history -n 10 # last 10 commitsThe restore command now uses explicit flags instead of a positional argument:
# Restore from a vault slug
git cas restore --slug photos/vacation --out ./restored.jpg
# Restore from a direct tree OID (existing behavior)
git cas restore --oid a1b2c3d4... --out ./restored.jpgBecause refs/cas/vault points to a commit whose tree references all stored
asset trees, every blob in the chain is reachable. git gc --prune=now will
not touch any vault data:
git cas vault list # entries exist
git gc --prune=now # aggressive garbage collection
git cas vault list # entries still intactThe vault uses compare-and-swap (CAS) semantics on git update-ref. If
another process updates the vault between your read and write, the operation
retries automatically (up to 3 times with exponential backoff). If all
retries fail, a VAULT_CONFLICT error is thrown.
Slugs are validated strictly:
- Must be a non-empty string
- No leading/trailing
/ - No empty segments (
a//b),., or..segments - No control characters (NUL, tabs, newlines)
- Each segment <= 255 bytes, total <= 1024 bytes
Invalid slugs throw INVALID_SLUG.
git-cas follows a hexagonal (ports and adapters) architecture. The domain
logic in CasService has zero direct dependencies on Node.js, Git, or any
specific crypto library. All platform-specific behavior is injected through
ports.
Facade (ContentAddressableStore)
|
+-- Domain Layer
| +-- CasService (core logic, EventEmitter)
| +-- Manifest (value object, Zod-validated)
| +-- Chunk (value object, Zod-validated)
| +-- CasError (structured errors)
| +-- ManifestSchema (Zod schemas)
|
+-- Ports (interfaces)
| +-- GitPersistencePort (writeBlob, writeTree, readBlob, readTree)
| +-- CodecPort (encode, decode, extension)
| +-- CryptoPort (sha256, randomBytes, encryptBuffer, decryptBuffer, createEncryptionStream)
|
+-- Infrastructure (adapters)
+-- GitPersistenceAdapter (Git plumbing commands)
+-- JsonCodec (JSON serialization)
+-- CborCodec (CBOR serialization)
+-- NodeCryptoAdapter (node:crypto)
+-- BunCryptoAdapter (Bun.CryptoHasher)
+-- WebCryptoAdapter (crypto.subtle)
Each port is an abstract base class with methods that throw Not implemented.
Adapters extend these classes and provide concrete implementations.
GitPersistencePort -- the storage interface:
class GitPersistencePort {
async writeBlob(content) {} // Returns Git OID
async writeTree(entries) {} // Returns tree OID
async readBlob(oid) {} // Returns Buffer
async readTree(treeOid) {} // Returns array of tree entries
}CodecPort -- the serialization interface:
class CodecPort {
encode(data) {} // Returns Buffer or string
decode(buffer) {} // Returns object
get extension() {} // Returns 'json', 'cbor', etc.
}CryptoPort -- the cryptographic operations interface:
class CryptoPort {
sha256(buf) {} // Returns hex digest
randomBytes(n) {} // Returns Buffer
encryptBuffer(buffer, key) {} // Returns { buf, meta }
decryptBuffer(buffer, key, meta) {} // Returns Buffer
createEncryptionStream(key) {} // Returns { encrypt, finalize }
deriveKey(options) {} // Returns { key, salt, params } (v2.0.0)
}To store chunks somewhere other than Git (e.g., S3, a database, or the local
filesystem), implement GitPersistencePort:
import GitPersistencePort from '@git-stunts/git-cas/src/ports/GitPersistencePort.js';
class S3PersistenceAdapter extends GitPersistencePort {
async writeBlob(content) {
const hash = computeHash(content);
await s3.putObject({ Key: hash, Body: content });
return hash;
}
async readBlob(oid) {
const response = await s3.getObject({ Key: oid });
return Buffer.from(await response.Body.transformToByteArray());
}
async writeTree(entries) {
// Implement tree assembly for your storage backend
}
async readTree(treeOid) {
// Implement tree reading for your storage backend
}
}Then inject it:
import CasService from '@git-stunts/git-cas/service';
const service = new CasService({
persistence: new S3PersistenceAdapter(),
codec: new JsonCodec(),
crypto: new NodeCryptoAdapter(),
});The GitPersistenceAdapter wraps every Git command in a resilience policy
(provided by @git-stunts/alfred). The default policy is a 30-second timeout
wrapping an exponential-backoff retry (2 retries, 100ms initial delay, 2s max
delay). You can override this:
import { Policy } from '@git-stunts/alfred';
const cas = new ContentAddressableStore({
plumbing: git,
policy: Policy.timeout(60_000).wrap(
Policy.retry({ retries: 5, backoff: 'exponential', delay: 200 })
),
});The default codec. Produces human-readable manifest files with pretty-printed indentation.
import { JsonCodec } from '@git-stunts/git-cas';
const codec = new JsonCodec();
const encoded = codec.encode({ slug: 'photos/vacation', chunks: [] });
// '{\n "slug": "photos/vacation",\n "chunks": []\n}'
codec.extension; // 'json'Manifests are stored in the tree as manifest.json.
A binary codec that produces smaller manifests. Useful when you are storing many assets and want to minimize overhead, or when the manifest does not need to be human-readable.
import { CborCodec } from '@git-stunts/git-cas';
const cas = new ContentAddressableStore({
plumbing: git,
codec: new CborCodec(),
});
// Or use the factory method:
const cas2 = ContentAddressableStore.createCbor({ plumbing: git });Manifests are stored in the tree as manifest.cbor.
| Consideration | JSON | CBOR |
|---|---|---|
| Human-readable | Yes | No |
| Manifest size | Larger | Smaller |
| Debugging ease | Easy to inspect | Requires tooling |
| Parse performance | Good | Slightly better |
| Default | Yes | No |
For most use cases, JSON is the right choice. Switch to CBOR if you are storing thousands of assets and the manifest size difference matters, or if you are in a pipeline where human readability is irrelevant.
To implement your own codec (e.g., MessagePack, Protobuf), extend CodecPort:
import CodecPort from '@git-stunts/git-cas/src/ports/CodecPort.js';
import msgpack from 'msgpack-lite';
class MsgPackCodec extends CodecPort {
encode(data) {
return msgpack.encode(data);
}
decode(buffer) {
return msgpack.decode(buffer);
}
get extension() {
return 'msgpack';
}
}Then pass it to the constructor:
const cas = new ContentAddressableStore({
plumbing: git,
codec: new MsgPackCodec(),
});The manifest will be stored in the tree as manifest.msgpack.
All errors thrown by git-cas are instances of CasError, which extends
Error with two additional properties:
code-- a machine-readable string identifiermeta-- an object with additional context
| Code | Meaning | Typical meta |
|---|---|---|
INVALID_KEY_TYPE |
Encryption key is not a Buffer or Uint8Array | -- |
INVALID_KEY_LENGTH |
Encryption key is not 32 bytes | { expected: 32, actual: N } |
MISSING_KEY |
Encrypted content restored without a key | -- |
INTEGRITY_ERROR |
Chunk digest mismatch or decryption auth failure | { chunkIndex, expected, actual } or { originalError } |
STREAM_ERROR |
Error reading from source stream during store | { chunksWritten, originalError } |
TREE_PARSE_ERROR |
Malformed ls-tree output from Git |
{ rawEntry } |
import { CasError } from '@git-stunts/git-cas/src/domain/errors/CasError.js';
try {
await cas.restoreFile({
manifest,
outputPath: './restored.jpg',
// Oops, forgot the encryption key
});
} catch (err) {
if (err.code === 'MISSING_KEY') {
console.error('This asset is encrypted. Please provide the encryption key.');
} else if (err.code === 'INTEGRITY_ERROR') {
console.error('Data corruption detected:', err.meta);
} else {
throw err; // unexpected error, re-throw
}
}Because every CasError has a code, you can build exhaustive error
handlers:
function handleCasError(err) {
switch (err.code) {
case 'INVALID_KEY_TYPE':
case 'INVALID_KEY_LENGTH':
return { status: 400, message: 'Invalid encryption key' };
case 'MISSING_KEY':
return { status: 401, message: 'Encryption key required' };
case 'INTEGRITY_ERROR':
return { status: 500, message: 'Data integrity check failed' };
case 'STREAM_ERROR':
return { status: 502, message: `Stream failed after ${err.meta.chunksWritten} chunks` };
case 'TREE_PARSE_ERROR':
return { status: 500, message: 'Corrupted Git tree' };
default:
return { status: 500, message: err.message };
}
}Constructing a Manifest or Chunk with invalid data throws a plain Error
(not a CasError) with a descriptive message from Zod validation:
import Manifest from '@git-stunts/git-cas/src/domain/value-objects/Manifest.js';
try {
new Manifest({ slug: '', filename: 'test.jpg', size: 0, chunks: [] });
} catch (err) {
// Error: Invalid manifest data: String must contain at least 1 character(s)
}Yes. git-cas uses Git plumbing commands (hash-object, mktree, cat-file,
ls-tree) that work identically in bare and non-bare repositories. Point
GitPlumbing at the bare repo path.
You get two manifests, but Git deduplicates the underlying blobs. If the file content has not changed, the blob OIDs will be identical. You are not wasting storage.
Yes, but the new store will produce different chunks and different blob OIDs. The old manifest remains valid -- its chunks are still in Git. You will have two sets of blobs: one for each chunk size.
No. The manifest stores only the algorithm, nonce, and authentication tag. The key is never stored in Git. If you lose the key, you cannot decrypt the content. Treat your key files like any other secret.
AES-256-GCM (Galois/Counter Mode). This is an authenticated encryption algorithm -- it provides both confidentiality and integrity. The authentication tag in the manifest ensures that any tampering with the ciphertext is detected during decryption.
Yes. git-cas v1.3.0+ includes runtime detection that automatically selects
the appropriate crypto adapter:
- Node.js:
NodeCryptoAdapter(usesnode:crypto) - Bun:
BunCryptoAdapter(usesBun.CryptoHasher) - Deno:
WebCryptoAdapter(usescrypto.subtle)
Use standard Git plumbing:
TREE_OID=$(git cas store ./vacation.jpg --slug photos/vacation --tree)
COMMIT_OID=$(git commit-tree "$TREE_OID" -m "Store vacation.jpg")
git update-ref refs/heads/assets "$COMMIT_OID"There is no hard limit imposed by git-cas. The practical limit is determined
by your Git repository's object database and available memory. Files are
streamed in chunks, so memory usage is proportional to chunkSize, not to
file size. However, the restore operation currently concatenates all chunks
into a single buffer, so restoring very large files requires enough memory
to hold the entire file.
The minimum chunk size is 1 KiB. This prevents pathologically small chunks
that would create excessive Git objects. Increase your chunkSize parameter.
There is also a hard cap at 100 MiB — values above this are rejected outright.
Setting chunkSize above 10 MiB will trigger a warning, since very large
chunks reduce deduplication benefit and increase memory pressure.
AES-256 requires exactly a 256-bit (32-byte) key. Ensure your key file contains exactly 32 raw bytes. A common mistake is to store the key as a hex string (64 characters) rather than raw bytes.
# Correct: 32 raw bytes
openssl rand -out my.key 32
# Wrong: this creates a hex-encoded file (64 bytes of ASCII)
openssl rand -hex 32 > my.keyThe blob field in each chunk is the Git SHA-1 OID returned by
git hash-object -w. It is the address of that chunk in Git's object
database. You can inspect any chunk directly:
git cat-file blob <blob-oid> | sha256sumThe output should match the digest field in the manifest.
Yes. A typical pattern:
# In your build step:
TREE=$(git cas store ./dist/artifact.tar.gz --slug builds/latest --tree)
git commit-tree "$TREE" -p HEAD -m "Build $(date +%s)" | xargs git update-ref refs/builds/latest
git push origin refs/builds/latest
# In your deploy step:
git fetch origin refs/builds/latest
TREE=$(git log -1 --format='%T' FETCH_HEAD)
git cas restore "$TREE" --out ./artifact.tar.gzEvery Git plumbing command is wrapped in a policy from @git-stunts/alfred.
The default policy applies a 30-second timeout and retries up to 2 times with
exponential backoff (100ms, then up to 2s). This handles transient filesystem
errors and lock contention gracefully. You can override the policy at
construction time (see Section 14).
Copyright 2026 James Ross. Licensed under Apache-2.0.

