diff --git a/CHANGELOG.md b/CHANGELOG.md index 194184c..10f3986 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,54 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **Vault rotate passphrase-file support** — `vault rotate` now accepts `--old-passphrase-file` and `--new-passphrase-file` flags, bringing it to parity with the store/restore passphrase-file support. +- **CLI store flags** — `--gzip`, `--strategy `, `--chunk-size `, `--concurrency `, `--codec `, `--merkle-threshold `, `--target-chunk-size `, `--min-chunk-size `, `--max-chunk-size `. All library-level chunking, compression, codec, and concurrency options are now accessible from the CLI. +- **CLI restore flags** — `--concurrency `, `--max-restore-buffer `. Parallel I/O and restore buffer limit now configurable from CLI. +- **`.casrc` config file** — JSON config file at the repository root provides default values for CLI flags. CLI flags always take precedence. Supports: `chunkSize`, `strategy`, `concurrency`, `codec`, `compression`, `merkleThreshold`, `maxRestoreBufferSize`, and `cdc.*` sub-keys. +- **CODE-EVAL.md** — Forensic architectural audit (zero-knowledge code extraction, critical assessment, roadmap reconciliation, prescriptive blueprint). +- **M16 Capstone** — New milestone in ROADMAP.md addressing all 9 audit flaws and 10 concerns (C1–C10). 13 task cards, ~698 LoC, ~21h estimated. +- **Concerns C8–C10** — Three architectural concerns from the CODE-EVAL.md audit now documented: crypto adapter LSP violation (C8), FixedChunker quadratic allocation (C9), encrypt-then-chunk dedup loss (C10). +- **CasError codes** — `RESTORE_TOO_LARGE` and `ENCRYPTION_BUFFER_EXCEEDED` registered in canonical error code table. +- **16.2 — Memory restore guard** — `CasService` accepts `maxRestoreBufferSize` (default 512 MiB). `_restoreBuffered` throws `RESTORE_TOO_LARGE` with `{ size, limit }` meta when encrypted/compressed restore would exceed the limit. Unencrypted streaming restore is unaffected. +- **16.3 — Web Crypto encryption buffer guard** — `WebCryptoAdapter` accepts `maxEncryptionBufferSize` (default 512 MiB). Throws `ENCRYPTION_BUFFER_EXCEEDED` when streaming encryption exceeds the limit, since Web Crypto AES-GCM is a one-shot API. NodeCryptoAdapter uses true streaming and is unaffected. +- **16.5 — Encrypt-then-chunk dedup warning** — `CasService.store()` now logs a warning when encryption is combined with CDC chunking, since ciphertext is pseudorandom and content-defined boundaries provide no dedup benefit. +- **16.10 — Orphaned blob tracking** — `STREAM_ERROR` now includes `meta.orphanedBlobs` — an array of OIDs for blobs successfully written before the stream failure. Error metric includes `orphanedBlobs` count for observability. +- **16.11 — Passphrase input security** — New `--vault-passphrase-file ` CLI option reads passphrase from file (use `-` for stdin). Interactive TTY prompt added as fallback when no other passphrase source is available. `resolvePassphrase` is now async with priority: file → flag → env → TTY → undefined. Empty passphrases rejected. File permission warning on group/world-readable files. +- **16.12 — KDF brute-force awareness** — `CasService` now emits `decryption_failed` metric with slug context when decryption fails with `INTEGRITY_ERROR` during encrypted restore. CLI adds a 1-second delay after `INTEGRITY_ERROR` to slow brute-force attempts. Library API imposes no delay — callers manage their own rate-limiting policy. +- **16.13 — GCM nonce collision docs + encryption counter** — `SECURITY.md` moved to project root with new sections: GCM nonce bound (2^32 NIST limit), key rotation frequency, KDF parameter guidance, and passphrase entropy recommendations. Vault metadata now tracks `encryptionCount`, incremented per encrypted `addToVault()`. Observability warning emitted when count exceeds 2^31. `VaultService` accepts optional `observability` port. +- **16.7 — Lifecycle method naming** — Added `inspectAsset()` (replaces `deleteAsset()`) and `collectReferencedChunks()` (replaces `findOrphanedChunks()`) as canonical names on both `CasService` and the facade. Old names are preserved as deprecated aliases that emit observability warnings. Type definitions updated with `@deprecated` JSDoc. + +### Changed +- **`runAction` injectable delay** — `runAction()` now accepts an optional `{ delay }` dependency, replacing the hardcoded `setTimeout` call. Tests inject a spy instead of using `vi.useFakeTimers()`, making INTEGRITY_ERROR rate-limit tests deterministic across Node, Bun, and Deno. +- **Test conventions** — Added `test/CONVENTIONS.md` documenting rules for deterministic, cross-runtime tests: inject time dependencies, use `chmod()` instead of `writeFile({ mode })`, avoid global state patching. +- **VaultService test observability wiring** — `VaultService.test.js` now passes a `mockObservability()` port to all tests instead of relying on the silent no-op default. `rotateVaultPassphrase.test.js` now passes `SilentObserver` explicitly. If observability wiring breaks, the test suite will catch it. +- **`NodeCryptoAdapter.encryptBuffer` JSDoc** — `@returns` annotation corrected to `Promise<...>`, matching the async implementation. +- **`maxRestoreBufferSize` documented** — constructor JSDoc and `#config` type in `ContentAddressableStore` now include the parameter. +- **ROADMAP.md heading level** — added `## Task Cards` heading between `# M16` and `### 16.1` to satisfy MD001 heading-increment rule. +- **16.1 — Crypto adapter behavioral normalization** — `NodeCryptoAdapter.encryptBuffer` now returns a Promise (was sync), matching Bun/Web. `decryptBuffer` validates key on all adapters. `NodeCryptoAdapter.createEncryptionStream` guards `finalize()` with `STREAM_NOT_CONSUMED`. New conformance test suite asserts identical contracts across all adapters. +- **16.4 — FixedChunker pre-allocated buffer** — Replaced `Buffer.concat()` loop with a pre-allocated `Buffer.allocUnsafe(chunkSize)` working buffer, eliminating O(n²) copies for many small input buffers. Matches the allocation strategy used by `CdcChunker`. + +### Fixed +- **Post-decompression size guard** — `_restoreBuffered` now enforces `maxRestoreBufferSize` after decompression, not just before. Compressed payloads that inflate beyond the configured limit now throw `RESTORE_TOO_LARGE` instead of silently allocating unbounded memory. +- **CLI passphrase prompt deferral** — `resolveEncryptionKey` now checks vault metadata before calling `resolvePassphrase`, avoiding unnecessary TTY prompts for unencrypted vaults. Store action recipient-conflict check inspects flags/env without consuming stdin. +- **CRLF passphrase normalization** — `readPassphraseFile` now strips trailing `\r\n` (Windows line endings) in addition to `\n`, preventing passphrase mismatches from Windows-edited files. +- **Constructor validation** — `CasService.maxRestoreBufferSize` (integer >= 1024), `CasService.chunkSize` (integer >= 1024), `WebCryptoAdapter.maxEncryptionBufferSize` (finite, positive), and `FixedChunker.chunkSize` (positive integer) are now validated at construction time, preventing silent misconfiguration. +- **Error-path test hardening** — `orphanedBlobs`, `restoreGuard`, `kdfBruteForce`, and `conformance` tests now fail explicitly when expected errors are not thrown (previously silent pass-through). +- **Orphaned blob enrichment on CasError re-throw** — `_chunkAndStore` now attaches `orphanedBlobs` metadata to existing `CasError` instances before re-throwing, instead of discarding the information. +- **VaultService metadata mutation on retry** — `addToVault` now shallow-copies `state.metadata` before mutation, preventing `encryptionCount` from being incremented multiple times across CAS retries. +- **16.8 — CasError portability guard** — `Error.captureStackTrace` now guarded with a runtime check. CasError constructs correctly on runtimes where `captureStackTrace` is unavailable (e.g. Firefox, older Deno). +- **16.9 — Pre-commit hook + hooks directory** — `scripts/git-hooks/` renamed to `scripts/hooks/` per CLAUDE.md convention. New `pre-commit` hook runs lint gate. `install-hooks.sh` updated accordingly. +- **16.6 — Chunk size upper bound** — CasService, FixedChunker, and CdcChunker now reject chunk sizes exceeding 100 MiB. CasService logs a warning when chunk size exceeds 10 MiB. +- **ROADMAP.md M16 summary** — Corrected LoC/hours from `~430/~28h` to `~698/~21h` to match the detailed task breakdown. +- **VaultService constructor type** — Added missing `observability?: ObservabilityPort` parameter to `index.d.ts` declaration. +- **Nullish coalescing for config merging** — `strategy` and `codec` in `mergeConfig()` now use `??` instead of `||`, so empty-string CLI values don't fall through to `.casrc` defaults. +- **Empty passphrase rejection** — `readPassphraseFile` rejects files that yield an empty string after newline stripping. `resolvePassphrase` validates `--vault-passphrase` flag and `GIT_CAS_PASSPHRASE` env var. +- **KDF algorithm validation** — `vault init` and `vault rotate` now validate `--algorithm` against the supported set (`pbkdf2`, `scrypt`) before passing to the KDF. +- **`.casrc` config validation** — `loadConfig()` now validates all config values (types, ranges, enum membership) after JSON parsing. +- **Deprecated method names in docs** — Updated `deleteAsset` → `inspectAsset` and `findOrphanedChunks` → `collectReferencedChunks` in README and GUIDE. +- **Missing error codes in SECURITY.md** — Added `RESTORE_TOO_LARGE` and `ENCRYPTION_BUFFER_EXCEEDED` sections. + ## [5.2.4] — Prism polish (2026-03-03) ### Fixed diff --git a/CODE-EVAL.md b/CODE-EVAL.md new file mode 100644 index 0000000..3ff5cce --- /dev/null +++ b/CODE-EVAL.md @@ -0,0 +1,605 @@ +# Forensic Architectural Audit: `@git-stunts/git-cas` + +**Audit Date:** 2026-03-03 +**Repository State:** `0f7f8e658e6cd094176541ac68d33b2a6ec75a91` (HEAD, `main`) +**Auditor:** Claude Opus 4.6, operating under zero-knowledge forensic protocol +**Version Under Audit:** 5.2.4 + +--- + +## Activity Log — Discovery Narrative + +The exploration began at the repository root with a simultaneous five-pronged dive: core domain services, infrastructure adapters, ports/codecs/chunkers, test structure, and type definitions. The first thing that jumped out — before reading a single line of code — was the file tree. Thirty-one source files, twelve bin files, sixty-one test files. A 3.1:1 test-to-source ratio. That alone telegraphs intent: someone cares about correctness here. + +The ports directory was my Rosetta Stone. Six abstract base classes — `CryptoPort`, `CodecPort`, `GitPersistencePort`, `GitRefPort`, `ObservabilityPort`, `ChunkingPort` — each throwing `'Not implemented'`. Textbook hexagonal architecture. I already knew this was a ports-and-adapters system before reading a single service file. + +`CasService.js` at 911 lines is the gravitational center. It imports no infrastructure directly — only ports. Good. `KeyResolver.js` (220 lines) handles all cryptographic key orchestration, recently extracted from CasService (the M15 Prism task card confirmed this). `VaultService.js` (467 lines) operates on a separate Git ref (`refs/cas/vault`) with compare-and-swap concurrency control. + +The three crypto adapters (`NodeCryptoAdapter`, `WebCryptoAdapter`, `BunCryptoAdapter`) are where I started changing my initial opinions. I expected copy-paste sloppiness — instead I found runtime-specific optimizations (Bun's native `CryptoHasher`, Web Crypto's `subtle` API) all converging on identical cryptographic parameters: AES-256-GCM, 12-byte nonce, 16-byte tag, SHA-256 content hashing. But the behavioral discrepancies between adapters (see Phase 2) tell a more nuanced story. + +The CDC chunker (`CdcChunker.js`) surprised me. A hand-rolled buzhash rolling hash with a 64-byte sliding window, xorshift64-seeded lookup table, and three-phase processing pipeline (fill window, feed pre-minimum, scan boundary). This is not commodity code — it's a bespoke content-defined chunking engine. + +The test suite confirmed the architecture: 833+ unit tests, crypto is never mocked (always real adapters), persistence is always mocked (in-memory maps), integration tests gate on Docker (`GIT_STUNTS_DOCKER=1`). The fuzz testing coverage is noteworthy — 50-iteration fuzz rounds for crypto, chunking, and store/restore. + +The CLI (`bin/git-cas.js`, 657 lines) implements a full TEA (The Elm Architecture) interactive dashboard. That's architecturally ambitious for a storage utility. + +My opinion shifted most dramatically on the vault system. I initially expected a simple key-value store backed by a file. Instead, it's a full commit chain on `refs/cas/vault` with optimistic concurrency control, exponential backoff retries, percent-encoded slug names, and atomic compare-and-swap ref updates. This is distributed-systems thinking applied to a local Git repo. + +--- + +## Phase 1: Zero-Knowledge Code Extraction + +### Deduced Value Proposition + +This system is a **content-addressed storage engine that uses Git's object database as its persistence layer**, with optional AES-256-GCM encryption, gzip compression, content-defined chunking, and a vault-based indexing system backed by Git refs. + +The core problem it solves: **storing, encrypting, versioning, and retrieving binary blobs entirely within Git's native object model** — no external servers, no sidecar databases, no LFS endpoints. Everything lives in `.git/objects` and is transportable via standard Git push/pull/clone. + +### Comprehensive Feature Set (Implemented) + +1. **Store**: Chunk a byte stream (fixed-size or CDC), optionally compress (gzip), optionally encrypt (AES-256-GCM), write chunks as Git blobs, produce a manifest. +2. **Restore**: Read chunks from Git blobs, verify SHA-256 integrity, decrypt, decompress, reassemble. +3. **Streaming Restore**: `restoreStream()` yields chunks as an async iterable — O(chunk_size) memory for unencrypted data. +4. **Content-Defined Chunking (CDC)**: Buzhash rolling hash with configurable min/max/target sizes. Deduplication-friendly. +5. **Fixed-Size Chunking**: Default 256 KiB, configurable. +6. **Merkle Tree Manifests**: Automatic manifest splitting when chunk count exceeds threshold (default 1000). Sub-manifest references with startIndex/chunkCount. +7. **Envelope Encryption**: DEK/KEK model. Random 32-byte DEK encrypts data; each recipient's KEK wraps the DEK independently. +8. **Multi-Recipient Management**: Add/remove recipients without re-encrypting data. +9. **Key Rotation**: Re-wrap DEK with new KEK. No data re-encryption — O(1) key rotation. +10. **Passphrase-Based Encryption**: PBKDF2 or scrypt KDF with configurable parameters. +11. **Vault System**: Git-ref-backed (`refs/cas/vault`) content registry with CAS (compare-and-swap) concurrency control. +12. **Vault Passphrase Rotation**: Re-wrap all envelope-encrypted vault entries with a new passphrase-derived KEK. +13. **Integrity Verification**: Per-chunk SHA-256 + GCM auth tag for encrypted data. +14. **Orphan Detection**: `findOrphanedChunks()` — reference-counting analysis across vault entries. +15. **Codec Pluggability**: JSON (human-readable) or CBOR (compact binary) manifests. +16. **Multi-Runtime Support**: Node.js 22, Bun, Deno — with runtime-specific crypto adapters. +17. **Observability**: Structured metrics (`chunk:stored`, `file:stored`, `integrity:pass/fail`), log levels, span tracing. +18. **CLI**: 18 commands including store, restore, verify, inspect, rotate, vault management, and an interactive TEA dashboard. +19. **Parallel I/O**: Semaphore-bounded concurrent blob writes (store) and read-ahead window (restore). +20. **File I/O Helpers**: `storeFile()` / `restoreFile()` for file-to-file convenience. + +### API Surface & Boundary + +**Public entrypoints** (as defined by package.json/jsr.json exports): + +| Entrypoint | Module | Primary Export | +|---|---|---| +| `.` (root) | `index.js` | `ContentAddressableStore` facade class | +| `./service` | `src/domain/services/CasService.js` | `CasService` (direct domain access) | +| `./schema` | `src/domain/schemas/ManifestSchema.js` | Zod schemas (ManifestSchema, ChunkSchema, etc.) | + +**Facade API** (`ContentAddressableStore`): + +| Method | Return | +|---|---| +| `store(options)` | `Promise` | +| `restore(options)` | `Promise<{ buffer, bytesWritten }>` | +| `restoreStream(options)` | `AsyncIterable` | +| `createTree(options)` | `Promise` (tree OID) | +| `readManifest(options)` | `Promise` | +| `verifyIntegrity(options)` | `Promise` | +| `deleteAsset(options)` | `Promise<{ slug, chunksOrphaned }>` | +| `findOrphanedChunks(options)` | `Promise<{ referenced, total }>` | +| `rotateKey(options)` | `Promise` | +| `addRecipient(options)` | `Promise` | +| `removeRecipient(options)` | `Promise` | +| `listRecipients(manifest)` | `string[]` | +| `deriveKey(options)` | `Promise<{ key, salt, params }>` | +| `getVaultService()` | `VaultService` | +| `rotateVaultPassphrase(options)` | `Promise<{ commitOid, rotatedSlugs, skippedSlugs }>` | + +**External system interface:** +- **Ingress**: File paths, byte streams (`AsyncIterable`), encryption keys (32-byte `Buffer`), passphrases (strings), vault slugs (strings). +- **Egress**: Git blob/tree OIDs (40-char hex strings), `Manifest` value objects, byte buffers, vault entries. +- **Infrastructure boundary**: All Git operations flow through `@git-stunts/plumbing` → `git` CLI subprocess. + +### Internal Architecture & Components + +``` +┌─────────────────────────────────────────────────────────┐ +│ ContentAddressableStore (index.js) — Facade │ +│ Wires ports, exposes unified API │ +└──────────────────────┬──────────────────────────────────┘ + │ + ┌─────────────┼──────────────┐ + │ │ │ +┌────────▼──────┐ ┌────▼─────┐ ┌─────▼──────────────────┐ +│ CasService │ │ Vault │ │ rotateVaultPassphrase │ +│ (911 lines) │ │ Service │ │ (standalone function) │ +│ │ │(467 lines│ └────────────────────────┘ +│ ┌───────────┐ │ └──────────┘ +│ │KeyResolver│ │ +│ │(220 lines)│ │ +│ └───────────┘ │ +└───────┬───────┘ + │ depends on (ports only) + ┌─────┼──────┬──────────┬────────────┐ + │ │ │ │ │ +┌─▼─┐ ┌▼──┐ ┌─▼──┐ ┌────▼────┐ ┌─────▼─────┐ +│Git│ │Git│ │Cry-│ │Observ- │ │Chunking │ +│Per│ │Ref│ │pto │ │ability │ │Port │ +│sis│ │Port│ │Port│ │Port │ │ │ +│ten│ │ │ │ │ │ │ │ │ +│ce │ │ │ │ │ │ │ │ │ +└─┬─┘ └─┬─┘ └──┬─┘ └────┬───┘ └─────┬─────┘ + │ │ │ │ │ + ▼ ▼ ▼ ▼ ▼ +┌───────────────────────────────────────────────┐ +│ Infrastructure Adapters │ +│ │ +│ GitPersistenceAdapter NodeCryptoAdapter │ +│ GitRefAdapter WebCryptoAdapter │ +│ FileIOHelper BunCryptoAdapter │ +│ EventEmitterObserver │ +│ JsonCodec / CborCodec SilentObserver │ +│ FixedChunker StatsCollector │ +│ CdcChunker │ +└───────────────────────────────────────────────┘ +``` + +The dependency direction is strictly inward: domain depends on ports (interfaces), infrastructure depends on ports (implements). The facade wires them together. No domain module imports any infrastructure module. + +### Mechanics & Internals + +#### Algorithms + +**Content-Defined Chunking (Buzhash):** +- Rolling hash over a 64-byte sliding window. +- Lookup table: 256-entry `Uint32Array` generated via xorshift64 PRNG seeded with `0x6a09e667f3bcc908` (SHA-256's first fractional prime constant — a nice touch). +- Hash update: `hash = (rotl32(hash, 1) ^ table[outgoing] ^ table[incoming]) >>> 0`. +- Boundary detection: `(hash & mask) === 0` where `mask = (1 << floor(log2(targetChunkSize))) - 1`. +- Three-phase pipeline: fill window (first 64 bytes), feed pre-minimum (accumulate until min chunk size), scan boundary (check on each byte until boundary or max). +- **Complexity**: O(n) where n = input bytes. Each byte requires one table lookup, one XOR, one rotate. The mask test is O(1). + +**Encryption:** +- AES-256-GCM with 12-byte random nonce and 16-byte authentication tag. +- Streaming encryption wraps the chunk pipeline (encrypt-then-chunk: the ciphertext is chunked, not the plaintext). +- DEK wrapping uses the same AES-256-GCM as data encryption — the DEK is treated as a 32-byte plaintext. + +**Key Derivation:** +- PBKDF2-HMAC-SHA-512 (default 100,000 iterations) or scrypt (default N=16384, r=8, p=1). +- Salt: 32 bytes random, stored in manifest. + +**Integrity:** +- SHA-256 digest per chunk (computed at store time, verified at restore time). +- GCM authentication tag for encrypted data (verified during decryption). +- Manifests validated by Zod schemas at construction time. + +#### Storage & Data Structures + +**Git Object Database:** +- Chunks stored as Git blobs via `git hash-object -w --stdin`. +- Manifests stored as Git blobs (JSON or CBOR encoded). +- Trees constructed via `git mktree` with mode `100644 blob` entries. +- Vault state stored as a commit chain on `refs/cas/vault`: + - Each commit points to a tree containing: `.vault.json` metadata blob + one `040000 tree` entry per vault slug. + +**In-Memory:** +- `Manifest` and `Chunk` are frozen value objects (immutable after construction). +- `Semaphore` uses a FIFO queue of promise resolvers. +- `StatsCollector` accumulates metrics in private fields. +- CDC chunker allocates a `Buffer.allocUnsafe(maxChunkSize)` working buffer per `chunk()` invocation. + +#### Memory Management + +**Store path:** +- Semaphore-bounded: at most `concurrency` chunk buffers in flight simultaneously. +- CDC chunker holds one `maxChunkSize` working buffer (~1 MiB default) plus the 64-byte sliding window. +- After chunking, the working buffer is copied via `Buffer.from(subarray)` — no aliasing. + +**Restore path (streaming, unencrypted):** +- Read-ahead window: up to `concurrency` chunk-sized buffers in memory. +- Chunks are yielded and become eligible for GC immediately after consumption. + +**Restore path (buffered, encrypted/compressed):** +- **All chunks are concatenated into a single buffer before decryption.** This is the documented memory amplification concern (Roadmap C1). A 1 GB encrypted file requires ~1 GB in memory for decryption, plus the decrypted result. + +**Web Crypto streaming encryption:** +- The `createEncryptionStream` on `WebCryptoAdapter` **buffers the entire stream** internally because Web Crypto's AES-GCM is a one-shot API. This silently converts O(chunk_size) memory to O(total_file_size) memory on Deno (Roadmap C4). + +#### Performance Characteristics + +| Operation | Time Complexity | Space Complexity | Blocking? | +|---|---|---|---| +| Store (fixed chunking) | O(n) | O(concurrency × chunkSize) | Git subprocess I/O | +| Store (CDC chunking) | O(n) | O(maxChunkSize + concurrency × chunkSize) | Git subprocess I/O | +| Restore (streaming, plain) | O(n) | O(concurrency × chunkSize) | Git subprocess I/O | +| Restore (buffered, encrypted) | O(n) | **O(n)** — full file in memory | Git subprocess I/O + decrypt | +| createTree (v1, < threshold) | O(k) where k = chunks | O(k) for tree entries | Git subprocess | +| createTree (v2, Merkle) | O(k) | O(k / threshold) sub-manifests | Git subprocess | +| readManifest (v2) | O(k) | O(sub-manifest count) reads | Git subprocess × sub-manifests | +| Key rotation | O(1) | O(1) — only re-wraps DEK | Constant | +| Vault CAS update | O(entries) | O(entries) for tree rebuild | Git subprocess | +| CDC boundary scan | O(n) per byte | O(1) per byte (table lookup + XOR) | CPU-bound | + +**Critical bottleneck:** Git subprocess spawning. Every `writeBlob`, `readBlob`, `writeTree`, `readTree` operation spawns a `git` child process. For a file with 1000 chunks at concurrency 4, that's ~1000 `git hash-object` invocations + ~1000 `git cat-file` invocations on restore. The `@git-stunts/plumbing` layer mitigates this somewhat but cannot eliminate the per-operation process overhead. + +--- + +## Phase 2: The Critical Assessment + +### Use Cases & Fitness + +**Optimized for:** +- Single-file binary asset storage (firmware images, data bundles, encrypted archives) in the 1 KB to ~500 MB range. +- Git monorepos where binary assets must travel with the code. +- Air-gapped or offline environments where external services are unavailable. +- Multi-recipient access control without re-encrypting data. + +**Where it will break:** +- **Files > 1 GB encrypted**: The `_restoreBuffered` path requires the entire file in memory for decryption. A 4 GB file on a machine with 8 GB RAM will OOM. +- **High-frequency writes**: Each chunk write spawns a Git subprocess. At 1000 writes/second with process spawn overhead (~5ms each), you hit a ceiling of ~200 chunks/second single-threaded. +- **Large repositories (>10 GB)**: Git's own performance degrades with ODB size. `git gc` becomes slow, pack files grow. +- **Web Crypto runtime (Deno) with large files**: The streaming encryption adapter silently buffers the entire file due to Web Crypto API limitations. +- **Concurrent vault mutations from multiple processes**: The CAS retry mechanism (3 attempts, 50-200ms backoff) handles light contention but will fail under sustained concurrent writes. + +### Design Trade-offs + +**1. Git subprocess for every blob operation vs. libgit2/in-process Git** + +- **Evidence:** + + - **Claim:** Every blob read/write spawns a `git` child process via `@git-stunts/plumbing`. + - **Primary Evidence:** `src/infrastructure/adapters/GitPersistenceAdapter.js:11-17` (`writeBlob` calls `plumbing.execute`) + - **Supporting Context:** `plumbing.execute()` and `plumbing.executeStream()` spawn `git` subprocesses. + - **Discovery Path:** `index.js` → `GitPersistenceAdapter` → `plumbing.execute` → `git hash-object` + - **Cryptographic Proof:** `git hash-object src/infrastructure/adapters/GitPersistenceAdapter.js` = `797be53113174ff8e86104fa97afda0748dd3fce` + +- **Systemic effect:** Process spawn overhead (~2-10ms per invocation) dominates I/O for small chunks. A 100 MB file with 256 KiB chunks = ~400 subprocess invocations for store + ~400 for restore. The `Policy.timeout(30_000)` wrapper adds resilience but not performance. +- **Trade-off rationale:** Using the `git` CLI ensures correctness across all Git configurations (bare repos, worktrees, custom object stores, alternates) without reimplementing Git's object database. It also means zero native dependencies — critical for multi-runtime support. + +**2. Encrypt-then-chunk vs. chunk-then-encrypt** + +- **Evidence:** + + - **Claim:** Encryption wraps the source stream before chunking, meaning ciphertext is what gets chunked — not plaintext. + - **Primary Evidence:** `src/domain/services/CasService.js:store()` — encryption stream wraps source before passing to `_chunkAndStore`. + - **Supporting Context:** The encryption stream is created first (`crypto.createEncryptionStream(key)`), then the encrypted output is piped through the chunker. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Systemic effect:** CDC deduplication is **completely defeated** for encrypted data because AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext (random nonce). This means encrypted CDC-chunked files get zero deduplication benefit. The chunking metadata is still recorded in the manifest, but it serves no dedup purpose. +- **Trade-off rationale:** The alternative (chunk-then-encrypt) would require per-chunk nonces and auth tags, significantly complicating the manifest schema and increasing metadata overhead. The current design keeps crypto simple (one nonce, one tag, one DEK for the whole file). + +**3. Full-buffer decrypt vs. streaming decrypt** + +- **Evidence:** + + - **Claim:** Encrypted/compressed restores buffer the entire file before decryption. + - **Primary Evidence:** `src/domain/services/CasService.js:_restoreBuffered()` — concatenates all chunk buffers then calls `decrypt()`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Systemic effect:** Memory usage is O(file_size) for encrypted restores. The `restoreStream()` API exists and is O(chunk_size) for plaintext, but encrypted paths silently degrade to O(n). +- **Trade-off rationale:** AES-256-GCM produces a single authentication tag for the entire ciphertext. Verifying the tag requires processing all ciphertext. Streaming authenticated decryption would require a different AEAD construction (e.g., STREAM from libsodium, or chunked AES-GCM with per-chunk tags). + +**4. Vault as Git commit chain vs. flat file** + +- **Evidence:** + + - **Claim:** The vault uses Git commits on `refs/cas/vault` with CAS (compare-and-swap) updates. + - **Primary Evidence:** `src/domain/services/VaultService.js:VAULT_REF`, `#casUpdateRef`, `#retryMutation` + - **Cryptographic Proof:** `git hash-object src/domain/services/VaultService.js` = `d5a1ac2b1a771e9a3a7ac1652c6f40e0f0cbffaa` + +- **Systemic effect:** Every vault mutation (add, remove, init) creates a new Git commit. This provides full audit history but grows the commit graph linearly. Over thousands of vault mutations, `git log refs/cas/vault` becomes slow. The CAS semantics handle concurrent writes gracefully but are limited to 3 retries with short backoff — insufficient for high-contention scenarios. +- **Trade-off rationale:** Using Git's native commit/ref mechanism means the vault is automatically included in `git push/pull/clone`. No separate sync mechanism needed. The audit trail is a natural consequence. + +**5. Semaphore-based concurrency vs. worker pool** + +- **Evidence:** + + - **Claim:** Parallel blob I/O uses a counting semaphore, not a proper worker/thread pool. + - **Primary Evidence:** `src/domain/services/Semaphore.js` — FIFO counting semaphore; `CasService.js:_chunkAndStore` — semaphore-guarded fan-out. + - **Cryptographic Proof:** `git hash-object src/domain/services/Semaphore.js` = `507ed14668364491797a68ed906b346b01ddd488` + +- **Systemic effect:** All concurrency is async I/O multiplexing on the event loop. There's no CPU parallelism for hashing or encryption. SHA-256 and AES-GCM run on the main thread (in Node.js). For CPU-bound workloads this is a bottleneck, but since the dominant cost is Git subprocess I/O, async concurrency is the correct choice. + +### Flaws & Limitations + +#### Flaw 1: Crypto Adapter Behavioral Inconsistencies + +- **Evidence:** + + - **Claim:** The three crypto adapters have inconsistent validation and error-handling behavior. + - **Primary Evidence:** `NodeCryptoAdapter.js:26-36`, `BunCryptoAdapter.js:25-44`, `WebCryptoAdapter.js:28-44` + - **Supporting Context:** + - `NodeCryptoAdapter.encryptBuffer` is synchronous; `BunCryptoAdapter.encryptBuffer` and `WebCryptoAdapter.encryptBuffer` are async. + - `BunCryptoAdapter.decryptBuffer` calls `_validateKey(key)`; `NodeCryptoAdapter.decryptBuffer` and `WebCryptoAdapter.decryptBuffer` do not. + - `NodeCryptoAdapter.createEncryptionStream` has no premature-finalize guard; Bun and Web adapters throw `CasError('STREAM_NOT_CONSUMED')`. + - **Cryptographic Proof:** + - `git hash-object src/infrastructure/adapters/NodeCryptoAdapter.js` = `f89898c5ec1892dd965e6ed69ac5373883ed1650` + - `git hash-object src/infrastructure/adapters/BunCryptoAdapter.js` = `1d8b8ce4def9cd8be885e5065041dbe0a0b6d0ac` + - `git hash-object src/infrastructure/adapters/WebCryptoAdapter.js` = `5a70733d945387a8a8101013157811aa654958c6` + +- **Impact:** Liskov Substitution violation. Code that works correctly on Bun (where `decryptBuffer` validates the key type early) may fail with a cryptic `node:crypto` error on Node.js (where the key is passed directly to `createDecipheriv`). The missing premature-finalize guard on Node means a bug in stream consumption produces undefined behavior on Node but a clear error on Bun/Deno. +- **Severity:** Medium. The callers generally `await` all results (which papers over sync-vs-async), and CasService always calls `_validateKey` before encrypting. But the asymmetry is a maintenance hazard. + +#### Flaw 2: Memory Amplification on Encrypted Restore + +- **Evidence:** + + - **Claim:** Encrypted restores load the entire file into memory. + - **Primary Evidence:** `src/domain/services/CasService.js:_restoreBuffered()` — `Buffer.concat(chunkBuffers)` before `this.decrypt()`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** Restoring a 1 GB encrypted file requires ~2 GB of heap (ciphertext buffer + plaintext output). No guard, no warning, no configurable limit. +- **Severity:** High for large files. The roadmap acknowledges this as concern C1 and estimates ~20 LoC to add a `maxRestoreBufferSize` guard. + +#### Flaw 3: Web Crypto Stream Buffering + +- **Evidence:** + + - **Claim:** `WebCryptoAdapter.createEncryptionStream` silently buffers the entire stream. + - **Primary Evidence:** `src/infrastructure/adapters/WebCryptoAdapter.js:64-84` — `const chunks = []; for await (const chunk of source) { chunks.push(chunk); } const buffer = Buffer.concat(chunks);` + - **Cryptographic Proof:** `git hash-object src/infrastructure/adapters/WebCryptoAdapter.js` = `5a70733d945387a8a8101013157811aa654958c6` + +- **Impact:** On Deno, `createEncryptionStream` provides a streaming API but has O(n) memory behavior. Users expect O(chunk_size) memory from a streaming API. This is deceptive. +- **Severity:** Medium. Deno is a secondary runtime, and the roadmap flags this as concern C4. + +#### Flaw 4: FixedChunker Quadratic Buffer Allocation + +- **Evidence:** + + - **Claim:** `FixedChunker.chunk()` uses `Buffer.concat()` in a loop, creating a new buffer allocation per input chunk. + - **Primary Evidence:** `src/infrastructure/chunkers/FixedChunker.js:20` — `buffer = Buffer.concat([buffer, data]);` + - **Cryptographic Proof:** `git hash-object src/infrastructure/chunkers/FixedChunker.js` = `1477e185f16730ad13028454cecb1fb2ac785889` + +- **Impact:** For a source that yields many small buffers (e.g., 4 KB network reads), `Buffer.concat([buffer, data])` is called for each read. This copies the accumulated buffer each time, yielding O(n^2/chunkSize) total memory copies where n is file size. In contrast, `CdcChunker` uses a pre-allocated working buffer with zero intermediate copies. +- **Severity:** Low in practice (the source is typically a file stream with 64 KiB reads), but architecturally inconsistent with the CDC chunker's careful buffer management. + +#### Flaw 5: CDC Deduplication Defeated by Encrypt-Then-Chunk + +- **Evidence:** + + - **Claim:** Encryption is applied before chunking, destroying content-addressable deduplication. + - **Primary Evidence:** `src/domain/services/CasService.js:store()` — encryption wraps source before `_chunkAndStore`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** The primary value proposition of CDC is sub-file deduplication. For encrypted files, CDC provides zero dedup benefit over fixed chunking. Users who enable both encryption and CDC chunking get CDC's overhead (rolling hash computation) without its benefit. +- **Severity:** Medium. This is an inherent limitation of the encrypt-then-chunk design. Fixing it would require per-chunk encryption (chunk-then-encrypt), which is a significant architectural change. + +#### Flaw 6: No Upper Bound on Chunk Size + +- **Evidence:** + + - **Claim:** `FixedChunker` accepts any positive `chunkSize` value without an upper bound. + - **Primary Evidence:** `src/infrastructure/chunkers/FixedChunker.js:9` — no validation beyond ChunkingPort base. + - **Supporting Context:** `CdcChunker` has configurable `maxChunkSize` (default 1 MiB) but no hard upper limit either. `resolveChunker` validates `chunkSize > 0` for fixed but has no ceiling. + - **Cryptographic Proof:** `git hash-object src/infrastructure/chunkers/FixedChunker.js` = `1477e185f16730ad13028454cecb1fb2ac785889` + +- **Impact:** A user could set `chunkSize: 10 * 1024 * 1024 * 1024` (10 GB) and the system would attempt to buffer a 10 GB chunk. The roadmap flags this as concern C3. +- **Severity:** Low (user misconfiguration, not a bug in normal usage). + +#### Flaw 7: `deleteAsset` Is Misleadingly Named + +- **Evidence:** + + - **Claim:** `deleteAsset()` does not delete anything — it only reads metadata. + - **Primary Evidence:** `src/domain/services/CasService.js:deleteAsset()` — reads manifest and returns `{ slug, chunksOrphaned }`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** API confusion. Similarly, `findOrphanedChunks()` doesn't find orphans — it finds referenced chunks. Both methods are analysis tools masquerading as lifecycle operations. +- **Severity:** Low (naming issue, not a functional defect). + +#### Flaw 8: Error.captureStackTrace Portability + +- **Evidence:** + + - **Claim:** `CasError` uses `Error.captureStackTrace` which is V8-specific. + - **Primary Evidence:** `src/domain/errors/CasError.js:5` — `Error.captureStackTrace(this, this.constructor);` + - **Cryptographic Proof:** `git hash-object src/domain/errors/CasError.js` = `6acc1da7e28ed698571f861900081d8b044cde57` + +- **Impact:** This is a no-op on non-V8 engines. Since the project targets Node (V8), Bun (JSC), and Deno (V8), it's a no-op on Bun's JavaScriptCore. Not a crash risk (it degrades gracefully), but indicates incomplete multi-runtime awareness. +- **Severity:** Negligible. + +#### Flaw 9: Missing pre-commit Hook + +- **Evidence:** + + - **Claim:** The project has a pre-push hook but no pre-commit hook. + - **Primary Evidence:** `scripts/git-hooks/pre-push` exists; `scripts/git-hooks/pre-commit` does not. + - **Supporting Context:** The CLAUDE.md global instructions specify that pre-commit should run lint. The hooks directory is also named `git-hooks` rather than the conventional `hooks` specified in CLAUDE.md. + +- **Impact:** Lint failures are not caught until push time. A developer can accumulate many unlinted commits before discovering issues. +- **Severity:** Low (process issue, not a code defect). + +### Innovation vs. Commodity + +**Novel or distinctive:** +1. **Git ODB as a CAS backend** — No other library treats Git's native object store as a general-purpose content-addressed storage layer with this level of sophistication (Merkle manifests, codec pluggability, vault indexing). +2. **Buzhash CDC implementation** — Hand-rolled, well-optimized, with a clever xorshift64 seeded table. Not copy-pasted from a library. +3. **DEK/KEK envelope encryption with zero-cost key rotation** — The key rotation model (re-wrap DEK, don't re-encrypt data) is architecturally elegant and matches the patterns used by KMS systems like AWS KMS. +4. **Vault as a Git commit chain** — Using Git refs for an atomic, auditable key-value store is creative. +5. **Multi-runtime JS with runtime-specific crypto** — Three crypto adapters targeting three JS runtimes is uncommon in the Node ecosystem. + +**Commodity:** +1. **AES-256-GCM encryption** — Standard AEAD construction, correctly implemented. +2. **PBKDF2/scrypt KDF** — Standard KDF choices with standard parameters. +3. **Zod schema validation** — Standard validation library, standard usage. +4. **Hexagonal architecture** — Well-known pattern, well-executed. +5. **Commander.js CLI** — Standard CLI framework, standard usage. + +**Assessment:** This codebase introduces genuinely novel abstractions (Git ODB as CAS, vault commit chain, zero-cost key rotation) while building on commodity cryptographic primitives. The combination is the innovation — not any individual component. + +--- + +## Phase 3: The Reality Check + +### Roadmap Reconciliation + +The roadmap lists 9 milestones (M7–M15). **All 9 are marked CLOSED.** There are zero open milestones. + +| Milestone | Roadmap Status | Verified in Code | Reconciliation | +|---|---|---|---| +| M7 Horizon | CLOSED (v2.0.0) | Yes — Merkle manifests (v2), compression, sub-manifests all implemented | Accurate | +| M8 Spit Shine | CLOSED (v4.0.1) | Yes — CryptoPort refactor, verify command, error handler all present | Accurate | +| M9 Cockpit | CLOSED (v4.0.1) | Yes — 18 CLI commands, --json flag, hints system all present | Accurate | +| M10 Hydra | CLOSED (v5.0.0) | Yes — CdcChunker with buzhash, resolveChunker, CDC params in manifest | Accurate | +| M11 Locksmith | CLOSED (v5.1.0) | Yes — addRecipient, removeRecipient, listRecipients, envelope encryption | Accurate | +| M12 Carousel | CLOSED (v5.2.0) | Yes — rotateKey, keyVersion tracking, DEK re-wrapping | Accurate | +| M13 Bijou | CLOSED (v3.1.0) | Yes — dashboard TUI, progress bars, encryption card, manifest view, heatmap | Accurate | +| M14 Conduit | CLOSED (v4.0.0) | Yes — restoreStream, ObservabilityPort, Semaphore, parallel I/O | Accurate | +| M15 Prism | CLOSED | Yes — async sha256 on NodeCryptoAdapter, KeyResolver extracted | Accurate | + +**Verdict: The roadmap is 100% accurate.** Every claimed milestone is verifiable in the codebase. No phantom features, no vaporware. This is unusual — most roadmaps overstate completion. + +### Backlog Triage + +The roadmap identifies 7 concerns (C1–C7) and 6 visions (V1–V6). Cross-referencing against Phase 2 findings: + +**Concerns already identified by the roadmap that Phase 2 confirmed:** + +| Concern | Roadmap Estimate | Phase 2 Finding | Agreement | +|---|---|---|---| +| C1: Memory amplification on encrypted restore | High severity, ~20 LoC | Flaw 2: Confirmed. O(n) memory for encrypted restores. | Full agreement | +| C2: Orphaned blob accumulation after STREAM_ERROR | Medium, ~20 LoC | Not independently discovered — the error handling drains promises correctly. Low priority. | Agreement on low urgency | +| C3: No upper bound on chunk size | Medium, ~6 LoC | Flaw 6: Confirmed. FixedChunker accepts any positive value. | Full agreement | +| C4: Web Crypto silent memory buffering | Medium, ~15 LoC | Flaw 3: Confirmed. `createEncryptionStream` buffers everything on Deno. | Full agreement | +| C5: Passphrase exposure in shell history | High, ~90 LoC | Not a code defect; architectural limitation of CLI passphrase flags. | Agreement | +| C6: No KDF brute-force rate limiting | Low, ~10 LoC | Not independently discovered. Low priority. | Agreement | +| C7: GCM nonce collision risk at scale | Low, ~20 LoC | Not practically exploitable. 2^48 encryptions needed for birthday bound on 96-bit nonce. | Agreement on low priority | + +**Critical architectural flaws from Phase 2 that ARE MISSING from the backlog:** + +1. **Crypto adapter behavioral inconsistencies (Flaw 1)** — The three adapters have different validation/error behavior. This is not mentioned in any concern or backlog item. The M15 Prism milestone addressed `sha256` async consistency but left the encrypt/decrypt inconsistencies untouched. + +2. **CDC deduplication defeated by encrypt-then-chunk (Flaw 5)** — The fundamental design decision that encryption wraps the stream before chunking is not flagged as a concern or limitation in the roadmap. The Feature Matrix claims "Sub-file deduplication: Via chunking" without noting it only works for unencrypted data. + +3. **FixedChunker quadratic buffer allocation (Flaw 4)** — Minor but missing from backlog. The CDC chunker received significant optimization attention; the fixed chunker did not. + +**Backlog items that should be deprioritized:** + +- **V1 Snapshot Trees** (~410 LoC, ~19h) — Nice to have but doesn't address any Phase 2 flaw. +- **V5 Watch Mode** (~220 LoC, ~10h) — Feature creep for a storage library. +- **V3 Manifest Diff Engine** (~180 LoC, ~8h) — Diagnostic tooling, not a stability concern. + +**Backlog items that should be prioritized:** + +- **C1 Memory amplification guard** — This is the highest-severity technical debt. 20 LoC to add a configurable ceiling. +- **Crypto adapter normalization** — Not in backlog. Needs to be added. ~30 LoC to align all three adapters. +- **V4 CompressionPort** (~180 LoC, ~8h) — Gzip-only compression is a significant limitation. zstd would provide 2-3x better compression ratios with faster decompression. + +--- + +## Phase 4: The Blueprint for Success + +### Month 1: Triage & Foundation + +**Week 1–2: Crypto Adapter Normalization** + +Align all three crypto adapters to identical behavioral contracts: + +1. Add `_validateKey(key)` call to `NodeCryptoAdapter.decryptBuffer()` and `WebCryptoAdapter.decryptBuffer()`. +2. Add premature-finalize guard to `NodeCryptoAdapter.createEncryptionStream()`. +3. Make `NodeCryptoAdapter.encryptBuffer()` explicitly async (return `Promise`). +4. Add a cross-adapter behavioral test suite that asserts identical behavior for all three adapters given the same inputs. + +*Estimated: ~50 LoC changes, ~100 LoC tests.* + +**Week 2: Memory Safety Guards** + +1. Add `maxRestoreBufferSize` option to CasService constructor (default: 512 MiB). Throw `CasError('RESTORE_BUFFER_EXCEEDED')` if the concatenated chunk buffer exceeds this limit in `_restoreBuffered()`. +2. Add buffer size guard to `WebCryptoAdapter.createEncryptionStream()` — throw if accumulated buffer exceeds a configurable limit. +3. Add upper bound validation to `FixedChunker` constructor (e.g., max 100 MiB) and `CdcChunker` (already has `maxChunkSize` but no ceiling on the ceiling). + +*Estimated: ~40 LoC changes, ~30 LoC tests.* + +**Week 3: FixedChunker Buffer Optimization** + +Replace the `Buffer.concat([buffer, data])` loop in `FixedChunker.chunk()` with a pre-allocated working buffer pattern matching `CdcChunker`: + +```js +const buf = Buffer.allocUnsafe(this.#chunkSize); +let offset = 0; +for await (const data of source) { + let srcPos = 0; + while (srcPos < data.length) { + const n = Math.min(data.length - srcPos, this.#chunkSize - offset); + data.copy(buf, offset, srcPos, srcPos + n); + offset += n; + srcPos += n; + if (offset === this.#chunkSize) { + yield Buffer.from(buf); + offset = 0; + } + } +} +if (offset > 0) yield Buffer.from(buf.subarray(0, offset)); +``` + +*Estimated: ~20 LoC change.* + +**Week 4: Missing pre-commit Hook + Process Hygiene** + +1. Add `scripts/git-hooks/pre-commit` that runs `pnpm run lint`. +2. Rename `scripts/git-hooks/` to `scripts/hooks/` to match CLAUDE.md convention (or update CLAUDE.md — choose one). +3. Add `Error.captureStackTrace` guard in `CasError`: `if (Error.captureStackTrace) Error.captureStackTrace(this, this.constructor);` + +*Estimated: ~10 LoC changes.* + +### Month 2: Structural Evolution + +**CompressionPort Abstraction (V4)** + +The current gzip-only compression is hardcoded. Introduce a `CompressionPort` abstract class with `compress(source)` and `decompress(source)` async generator methods. Implement `GzipCompressor` (existing behavior) and `ZstdCompressor` (via `node:zlib` or `zstd-codec`). Update `CompressionSchema` to accept `'gzip' | 'zstd'`. + +*Estimated: ~180 LoC, aligns with V4 vision.* + +**Document the Encrypt-Then-Chunk Limitation** + +This is not fixable without a major architectural change (chunk-then-encrypt with per-chunk AEAD). The correct action is: + +1. Document that CDC deduplication is ineffective for encrypted data. +2. Consider emitting a warning when `encryption + chunking.strategy === 'cdc'` are both specified. +3. If the user explicitly opts in, allow it — but make the trade-off visible. + +*Estimated: ~10 LoC (warning), documentation update.* + +**Interactive Passphrase Prompt (V6)** + +Address concern C5 (passphrase exposure in shell history) by adding TTY-based passphrase prompts with echo disabled. Fall back to flag-based input when stdin is not a TTY. + +*Estimated: ~90 LoC, aligns with V6 vision.* + +### Month 3: Strategic Re-alignment + +**Portable Bundles (V2)** + +The air-gapped use case is a key differentiator. Implement `.casb` bundle files that package manifest + chunks for transport without Git. This enables: +- Export: `git cas export --slug --out archive.casb` +- Import: `git cas import --bundle archive.casb` + +*Estimated: ~340 LoC, aligns with V2 vision.* + +**Garbage Collection Automation** + +The `deleteAsset` and `findOrphanedChunks` methods are analysis-only. Complete the lifecycle: +1. Rename `deleteAsset` to `inspectAsset` or `getAssetMetadata` (breaking change). +2. Implement actual GC via `git prune` after vault entry removal. +3. Add `git cas gc` CLI command with `--dry-run` support. + +*Estimated: ~80 LoC.* + +**CI Hardening** + +1. Add `dependabot.yml` for dependency updates. +2. Add `CODEOWNERS` file. +3. Add security scanning (e.g., `npm audit` in CI). +4. Add `SECURITY.md` at project root (currently missing, noted in CLAUDE.md scaffolding requirements). + +--- + +### Executive Conclusion + +**Health: Strong.** This is a well-architected, thoroughly tested codebase with a clear domain model, strict port/adapter boundaries, and an unusually high test-to-code ratio (3.1:1). The 833+ unit tests with real crypto (never mocked) and fuzz coverage demonstrate a commitment to correctness that is rare in the Node.js ecosystem. + +**Intellectual Property Value: Moderate-High.** The novel contributions — Git ODB as CAS, buzhash CDC with xorshift-seeded tables, zero-cost DEK/KEK key rotation, vault commit chains with CAS semantics — represent genuine engineering innovation. These are not reimplementations of existing libraries; they are original abstractions built on well-understood primitives. + +**Technical Debt: Low.** The roadmap's 7 concerns accurately catalog the known issues. Phase 2 surfaced only 3 additional findings (crypto adapter inconsistencies, encrypt-then-chunk dedup limitation, FixedChunker buffer allocation), none of which are critical. The most urgent issue — memory amplification on encrypted restore — is a ~20 LoC fix. + +**Long-term Viability: Good with caveats.** The system is viable for its target niche (Git-native encrypted binary storage). The Git subprocess bottleneck limits throughput for very high-frequency operations, but this is an acceptable trade-off for correctness and portability. The encrypt-then-chunk design is a permanent architectural constraint that limits CDC's value for encrypted data — this should be prominently documented rather than "fixed." + +**The Honest Assessment:** This codebase punches above its weight. A ~3,900 LoC core library with 12,000 LoC of tests, multi-runtime support, envelope encryption, CDC chunking, Merkle manifests, and an interactive TUI — all with zero native dependencies and no external server requirements. The architecture is clean, the test coverage is comprehensive, and the roadmap is honest. The identified flaws are minor and addressable. This is a well-maintained project by someone who takes software engineering seriously. + +--- + +*Audit conducted at commit `0f7f8e658e6cd094176541ac68d33b2a6ec75a91`.* +*All blob hashes verified via `git hash-object` against live repository state.* diff --git a/GUIDE.md b/GUIDE.md index be6ea26..1d8783e 100644 --- a/GUIDE.md +++ b/GUIDE.md @@ -575,6 +575,51 @@ git cas restore a1b2c3d4e5f67890... --out ./decrypted-vacation.jpg --key-file ./ # Output: 524288 ``` +### Compression, Chunking, and Codec Flags + +```bash +# Enable gzip compression +git cas store ./data.bin --slug my-data --tree --gzip + +# Use CDC (content-defined chunking) for sub-file deduplication +git cas store ./data.bin --slug my-data --tree --strategy cdc + +# Customize chunk size and enable parallel I/O +git cas store ./data.bin --slug my-data --tree --chunk-size 65536 --concurrency 4 + +# Use CBOR codec for smaller manifests +git cas store ./data.bin --slug my-data --tree --codec cbor + +# CDC with custom parameters +git cas store ./data.bin --slug my-data --tree \ + --strategy cdc --target-chunk-size 32768 \ + --min-chunk-size 8192 --max-chunk-size 131072 + +# Restore with parallel I/O +git cas restore --slug my-data --out ./data.bin --concurrency 4 +``` + +### Project Config File (`.casrc`) + +Place a `.casrc` JSON file at the repository root to set defaults. CLI flags +always take precedence. + +```json +{ + "chunkSize": 65536, + "strategy": "cdc", + "concurrency": 4, + "codec": "json", + "compression": "gzip", + "merkleThreshold": 500, + "cdc": { + "minChunkSize": 8192, + "targetChunkSize": 32768, + "maxChunkSize": 131072 + } +} +``` + ### Working Directory By default the CLI operates in the current directory. Use `--cwd` to point at @@ -621,14 +666,14 @@ The `verifyIntegrity` method reads each chunk blob from Git, recomputes its SHA-256 digest, and compares it against the manifest. It emits either `integrity:pass` or `integrity:fail` events (see Section 9). -### Deleting an Asset +### Inspecting an Asset -`deleteAsset` returns logical deletion metadata for an asset without +`inspectAsset` returns logical deletion metadata for an asset without performing any destructive Git operations. The caller is responsible for removing refs and running `git gc --prune` to reclaim space: ```js -const { slug, chunksOrphaned } = await cas.deleteAsset({ treeOid }); +const { slug, chunksOrphaned } = await cas.inspectAsset({ treeOid }); console.log(`Asset "${slug}" has ${chunksOrphaned} chunks to clean up`); // Remove the ref pointing to the tree, then: @@ -638,24 +683,30 @@ console.log(`Asset "${slug}" has ${chunksOrphaned} chunks to clean up`); This is intentionally non-destructive: CAS never modifies or deletes Git objects. It only tells you what would become unreachable. -### Finding Orphaned Chunks +> **Deprecation note:** `deleteAsset()` is a deprecated alias for +> `inspectAsset()`. It will be removed in a future major version. + +### Collecting Referenced Chunks When you store the same file multiple times with different chunk sizes, or store overlapping files, some chunk blobs may no longer be referenced by any -manifest. `findOrphanedChunks` aggregates all referenced chunk blob OIDs +manifest. `collectReferencedChunks` aggregates all referenced chunk blob OIDs across multiple assets: ```js -const { referenced, total } = await cas.findOrphanedChunks({ +const { referenced, total } = await cas.collectReferencedChunks({ treeOids: [treeOid1, treeOid2, treeOid3] }); console.log(`${referenced.size} unique blobs across ${total} total chunk references`); ``` If any `treeOid` lacks a manifest, the call throws -`CasError('MANIFEST_NOT_FOUND')` (fail closed). This is analysis only -- no +`CasError('MANIFEST_NOT_FOUND')` (fail closed). This is analysis only — no objects are deleted or modified. +> **Deprecation note:** `findOrphanedChunks()` is a deprecated alias for +> `collectReferencedChunks()`. It will be removed in a future major version. + ### Working with Multiple Assets A common pattern is to store multiple assets and assemble their trees into @@ -1564,11 +1615,15 @@ file size. However, the restore operation currently concatenates all chunks into a single buffer, so restoring very large files requires enough memory to hold the entire file. -### Q: I get "Chunk size must be at least 1024 bytes" +### Q: I get "Chunk size must be an integer >= 1024 bytes" The minimum chunk size is 1 KiB. This prevents pathologically small chunks that would create excessive Git objects. Increase your `chunkSize` parameter. +There is also a hard cap at 100 MiB — values above this are rejected outright. +Setting `chunkSize` above 10 MiB will trigger a warning, since very large +chunks reduce deduplication benefit and increase memory pressure. + ### Q: I get "Encryption key must be 32 bytes, got N" AES-256 requires exactly a 256-bit (32-byte) key. Ensure your key file diff --git a/README.md b/README.md index 21b3946..6286304 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ We use the object database. - **Manifests** a tiny explicit index of chunks + metadata (JSON/CBOR). - **Tree output** generates standard Git trees so assets snap into commits cleanly. - **Full round-trip** store, tree, and restore — get your bytes back, verified. -- **Lifecycle management** `readManifest`, `deleteAsset`, `findOrphanedChunks` — inspect trees, plan deletions, audit storage. +- **Lifecycle management** `readManifest`, `inspectAsset`, `collectReferencedChunks` — inspect trees, plan deletions, audit storage. - **Vault** GC-safe ref-based storage. One ref (`refs/cas/vault`) indexes all assets by slug. No more silent data loss from `git gc`. - **Interactive dashboard** `git cas inspect` with chunk heatmap, animated progress bars, and rich manifest views. - **Verify & JSON output** `git cas verify` checks integrity; `--json` on all commands for CI/scripting. @@ -229,9 +229,9 @@ await cas.restoreFile({ manifest, outputPath: './restored.png' }); // Read the manifest back from a tree OID const m = await cas.readManifest({ treeOid }); -// Lifecycle: inspect deletion impact, find orphaned chunks -const { slug, chunksOrphaned } = await cas.deleteAsset({ treeOid }); -const { referenced, total } = await cas.findOrphanedChunks({ treeOids: [treeOid] }); +// Lifecycle: inspect deletion impact, collect referenced chunks +const { slug, chunksOrphaned } = await cas.inspectAsset({ treeOid }); +const { referenced, total } = await cas.collectReferencedChunks({ treeOids: [treeOid] }); // v2.0.0: Compressed + passphrase-encrypted store const manifest2 = await cas.storeFile({ @@ -295,16 +295,47 @@ git cas vault init git cas store ./secret.bin --slug vault-entry --tree git cas restore --slug vault-entry --out ./decrypted.bin +# Compression, chunking, codec, concurrency +git cas store ./data.bin --slug my-data --tree --gzip +git cas store ./data.bin --slug my-data --tree --strategy cdc +git cas store ./data.bin --slug my-data --tree --chunk-size 65536 --concurrency 4 +git cas store ./data.bin --slug my-data --tree --codec cbor + +# Restore with concurrency +git cas restore --slug my-data --out ./data.bin --concurrency 4 + # JSON output on any command (for CI/scripting) git cas store ./data.bin --slug my-data --tree --json ``` +### `.casrc` — Project Config File + +Place a `.casrc` JSON file at the repository root to set defaults for CLI flags. +CLI flags always take precedence over `.casrc` values. + +```json +{ + "chunkSize": 65536, + "strategy": "cdc", + "concurrency": 4, + "codec": "json", + "compression": "gzip", + "merkleThreshold": 500, + "maxRestoreBufferSize": 1073741824, + "cdc": { + "minChunkSize": 8192, + "targetChunkSize": 32768, + "maxChunkSize": 131072 + } +} +``` + ## Documentation - [Guide](./GUIDE.md) — progressive walkthrough - [API Reference](./docs/API.md) — full method documentation - [Architecture](./ARCHITECTURE.md) — hexagonal design overview -- [Security](./docs/SECURITY.md) — crypto design and threat model +- [Security](./SECURITY.md) — crypto design and threat model ## When to use git-cas (and when not to) diff --git a/ROADMAP.md b/ROADMAP.md index 99ddfc4..81fdffc 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -9,7 +9,7 @@ This roadmap is structured as: 3. **Contracts** — Return/throw semantics for all public methods 4. **Version Plan** — Table mapping versions to milestones 5. **Milestone Dependency Graph** — ASCII diagram -6. **Milestones & Task Cards** — 7 milestones (4 closed, 3 open), remaining task cards +6. **Milestones & Task Cards** — 8 milestones (7 closed, 1 open), remaining task cards 7. **Feature Matrix** — Competitive landscape vs. Git LFS, git-annex, Restic, Age, DVC 8. **Competitive Analysis** — When to use git-cas and when not to, with concrete scenarios @@ -56,6 +56,8 @@ Single registry of all error codes used across the codebase. Each code is a stri | `CANNOT_REMOVE_LAST_RECIPIENT` | Cannot remove the last recipient — at least one must remain. | Task 11.2 | | `ROTATION_NOT_SUPPORTED` | Key rotation requires envelope encryption (DEK/KEK model). Legacy manifests must be re-stored. | Task 12.1 | | `STREAM_NOT_CONSUMED` | `finalize()` called on encryption stream before the generator was fully consumed. | v4.0.1 | +| `RESTORE_TOO_LARGE` | Encrypted/compressed file exceeds `maxRestoreBufferSize`. Buffered restore would OOM. Suggest increasing limit or storing without encryption. | M16 | +| `ENCRYPTION_BUFFER_EXCEEDED` | Web Crypto adapter accumulated buffer exceeds limit during streaming encryption (Deno-specific). Suggest Node.js/Bun or unencrypted store. | M16 | --- @@ -192,6 +194,7 @@ Return and throw semantics for every public method (current and planned). | v5.0.0 | M10 | Hydra | Content-defined chunking | ✅ | | v5.1.0 | M11 | Locksmith | Multi-recipient encryption | ✅ | | v5.2.0 | M12 | Carousel | Key rotation | ✅ | +| v5.3.0 | M16 | Capstone | Audit remediation — all CODE-EVAL.md findings | 🔲 | --- @@ -206,6 +209,8 @@ M8 Spit Shine + M9 Cockpit (v4.0.1) ✅ M10 Hydra ──────────── ✅ v5.0.0 M11 Locksmith ──────── ✅ v5.1.0 └──► M12 Carousel ── ✅ v5.2.0 +M15 Prism ─────────────── ✅ + └──► M16 Capstone ────── 🔲 v5.3.0 ``` --- @@ -223,6 +228,7 @@ M11 Locksmith ──────── ✅ v5.1.0 | M10| Hydra | Content-defined chunking | v5.0.0 | 4 | ~690 | ~22h | ✅ CLOSED | | M11| Locksmith | Multi-recipient encryption | v5.1.0 | 4 | ~580 | ~20h | ✅ CLOSED | | M12| Carousel | Key rotation | v5.2.0 | 4 | ~400 | ~13h | ✅ CLOSED | +| M16| Capstone | Audit remediation | v5.3.0 | 13 | ~698 | ~21h | 🔲 OPEN | Completed task cards are in [COMPLETED_TASKS.md](./COMPLETED_TASKS.md). Superseded tasks are in [GRAVEYARD.md](./GRAVEYARD.md). @@ -262,6 +268,445 @@ All tasks completed (12.1–12.4). See [COMPLETED_TASKS.md](./COMPLETED_TASKS.md --- +# M16 — Capstone (v5.3.0) 🔲 OPEN + +Remediation milestone addressing all negative findings from the [CODE-EVAL.md](./CODE-EVAL.md) forensic architectural audit. Covers 9 code flaws (Phase 2), 7 pre-existing concerns (C1–C7), and 3 newly identified concerns (C8–C10). No new features — strictly hardening, correctness, and hygiene. + +**Source:** `CODE-EVAL.md` at commit `0f7f8e6` + +**Priority key:** P0 = critical (high severity), P1 = important (medium), P2 = housekeeping (low/negligible). + +--- + +## Task Cards + +### 16.1 — Crypto Adapter Behavioral Normalization *(P0)* — C8 + +**Problem** + +The three CryptoPort adapters (Node, Bun, Web) have inconsistent validation and error-handling behavior — a Liskov Substitution violation. Specifically: + +1. `NodeCryptoAdapter.encryptBuffer()` is synchronous; Bun and Web are async. +2. `BunCryptoAdapter.decryptBuffer()` calls `_validateKey(key)`; Node and Web do not. +3. `NodeCryptoAdapter.createEncryptionStream()` has no premature-finalize guard; Bun and Web throw `CasError('STREAM_NOT_CONSUMED')`. + +Code that works on Bun (early key validation) may produce a cryptic `node:crypto` error on Node. A bug in stream consumption produces undefined behavior on Node but a clear error on Bun/Deno. + +**Fix** + +1. Add `_validateKey(key)` call to `NodeCryptoAdapter.decryptBuffer()` and `WebCryptoAdapter.decryptBuffer()`. +2. Add `streamFinalized` guard + `CasError('STREAM_NOT_CONSUMED')` to `NodeCryptoAdapter.createEncryptionStream()`. +3. Make `NodeCryptoAdapter.encryptBuffer()` explicitly `async` (return `Promise`). +4. Add a cross-adapter behavioral conformance test suite asserting identical behavior for all three adapters given the same inputs. + +**Files:** +- `src/infrastructure/adapters/NodeCryptoAdapter.js` +- `src/infrastructure/adapters/WebCryptoAdapter.js` +- New: `test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js` + +**Tests:** +```js +describe('16.1: CryptoPort LSP conformance', () => { + // Run the same assertions against all three adapters + for (const [name, adapter] of adapters) { + it(`${name}.encryptBuffer returns a Promise`, ...); + it(`${name}.decryptBuffer rejects invalid key type before crypto error`, ...); + it(`${name}.decryptBuffer rejects wrong-length key before crypto error`, ...); + it(`${name}.createEncryptionStream.finalize() throws STREAM_NOT_CONSUMED if not consumed`, ...); + } +}); +``` + +| Estimate | ~50 LoC changes, ~100 LoC tests, ~4h | +|----------|---------------------------------------| + +--- + +### 16.2 — Memory Restore Guard *(P0)* — C1 + +**Problem** + +`_restoreBuffered()` concatenates ALL chunk blobs into a single buffer before decryption. A 1 GB encrypted file requires ~2 GB of heap. No guard, no warning, no configurable limit. + +**Fix** + +Add `maxRestoreBufferSize` option to CasService constructor (default 512 MiB). Before `Buffer.concat()` in `_restoreBuffered()`, check `manifest.size` against the limit. Throw `CasError('RESTORE_TOO_LARGE')` with an actionable message. + +**Files:** +- `src/domain/services/CasService.js` +- `index.js` (facade wiring) +- `index.d.ts` (type update) + +**Tests:** +```js +describe('16.2: Memory guard on encrypted restore', () => { + it('throws RESTORE_TOO_LARGE when manifest.size exceeds maxRestoreBufferSize', ...); + it('succeeds when manifest.size is within maxRestoreBufferSize', ...); + it('does not apply guard to unencrypted uncompressed restoreStream', ...); + it('includes actionable hint in error message', ...); + it('default maxRestoreBufferSize is 512 MiB', ...); +}); +``` + +| Estimate | ~25 LoC changes, ~40 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### 16.3 — Web Crypto Encryption Buffer Guard *(P1)* — C4 + +**Problem** + +`WebCryptoAdapter.createEncryptionStream()` silently buffers the entire stream because Web Crypto AES-GCM is a one-shot API. On Deno, a user calling `store()` with a large encrypted source OOMs without warning. + +**Fix** + +Track accumulated bytes in the `encrypt()` generator. When total exceeds a configurable limit (default 512 MiB), throw `CasError('ENCRYPTION_BUFFER_EXCEEDED')` with an actionable message. + +**Files:** +- `src/infrastructure/adapters/WebCryptoAdapter.js` + +**Tests:** +```js +describe('16.3: Web Crypto buffering guard', () => { + it('throws ENCRYPTION_BUFFER_EXCEEDED when accumulated bytes exceed limit', ...); + it('succeeds for data within buffer limit', ...); + it('NodeCryptoAdapter does NOT throw for large streams (true streaming)', ...); +}); +``` + +| Estimate | ~15 LoC changes, ~30 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.4 — FixedChunker Pre-Allocated Buffer *(P2)* — C9 + +**Problem** + +`FixedChunker.chunk()` uses `Buffer.concat([buffer, data])` in a loop. Each call copies the entire accumulated buffer — O(n^2 / chunkSize) total copies for many small input buffers. The CDC chunker uses a pre-allocated working buffer with zero intermediate copies. + +**Fix** + +Replace the concat loop with a pre-allocated `Buffer.allocUnsafe(chunkSize)` working buffer using a copy+offset pattern, matching CdcChunker's approach. + +**Files:** +- `src/infrastructure/chunkers/FixedChunker.js` + +**Tests:** + +Existing tests cover byte-exact correctness. Add: +```js +describe('16.4: FixedChunker buffer efficiency', () => { + it('produces identical output to previous implementation (regression)', ...); + it('handles many small input buffers without excessive allocation', ...); +}); +``` + +| Estimate | ~20 LoC changes, ~15 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.5 — Encrypt-Then-Chunk Dedup Warning *(P1)* — C10 + +**Problem** + +Encryption is applied before chunking, destroying content-addressable deduplication. AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext. Users who enable both encryption and CDC chunking get CDC's overhead without its dedup benefit. + +This is an inherent architectural constraint (not fixable without per-chunk encryption). The correct action is documentation + a runtime warning. + +**Fix** + +1. When `store()` is called with both an encryption key/passphrase/recipients AND `chunker.strategy === 'cdc'`, emit `observability.log('warn', 'CDC deduplication is ineffective with encryption — ciphertext is pseudorandom', { strategy: 'cdc' })`. +2. Add a "Known Limitations" section to the README documenting this trade-off. + +**Files:** +- `src/domain/services/CasService.js` (warning in `store()`) + +**Tests:** +```js +describe('16.5: Encrypt-then-chunk dedup warning', () => { + it('emits warning when encryption + CDC chunking are combined', ...); + it('does not warn for encryption + fixed chunking', ...); + it('does not warn for CDC chunking without encryption', ...); +}); +``` + +| Estimate | ~10 LoC changes, ~20 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.6 — Chunk Size Upper Bound *(P1)* — C3 + +**Problem** + +`CasService` enforces a minimum chunk size (1024 bytes) but no maximum. A user can configure a 4 GB chunk size. Additionally, `FixedChunker` and `CdcChunker` accept arbitrarily large values without validation. + +**Fix** + +1. Add `if (chunkSize > MAX_CHUNK_SIZE)` guard in `CasService` constructor. 100 MiB is the cap — generous while staying within Git hosting limits. +2. Emit `observability.log('warn', ...)` when chunkSize exceeds 10 MiB. +3. Add matching validation in `FixedChunker` constructor: `if (chunkSize > 100 * 1024 * 1024) throw new RangeError(...)`. +4. Add matching validation in `CdcChunker` constructor for `maxChunkSize`. + +**Files:** +- `src/domain/services/CasService.js` +- `src/infrastructure/chunkers/FixedChunker.js` +- `src/infrastructure/chunkers/CdcChunker.js` + +**Tests:** +```js +describe('16.6: Chunk size upper bound', () => { + it('CasService throws when chunkSize exceeds 100 MiB', ...); + it('CasService accepts chunkSize of exactly 100 MiB', ...); + it('FixedChunker throws when chunkSize exceeds 100 MiB', ...); + it('CdcChunker throws when maxChunkSize exceeds 100 MiB', ...); + it('logs warning when chunkSize exceeds 10 MiB', ...); +}); +``` + +| Estimate | ~15 LoC changes, ~30 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.7 — Lifecycle Method Naming *(P2)* + +**Problem** + +`deleteAsset()` does not delete anything — it reads a manifest and returns metadata about what would be orphaned. `findOrphanedChunks()` doesn't find orphans — it collects referenced chunk OIDs. Both names are misleading. + +**Fix** + +1. Add `inspectAsset({ treeOid })` as the canonical name. `deleteAsset` becomes a deprecated alias that delegates to `inspectAsset`. +2. Add `collectReferencedChunks({ treeOids })` as the canonical name. `findOrphanedChunks` becomes a deprecated alias. +3. Emit `observability.log('warn', 'deleteAsset() is deprecated — use inspectAsset()')` on deprecated path. +4. Update `index.d.ts` with `@deprecated` JSDoc on old methods. + +This is a **non-breaking** deprecation. Removal is deferred to a future major version. + +**Files:** +- `src/domain/services/CasService.js` +- `index.js` (facade) +- `index.d.ts` + +**Tests:** +```js +describe('16.7: Lifecycle method naming', () => { + it('inspectAsset returns { slug, chunksOrphaned }', ...); + it('deleteAsset delegates to inspectAsset (deprecated alias)', ...); + it('collectReferencedChunks returns { referenced, total }', ...); + it('findOrphanedChunks delegates to collectReferencedChunks (deprecated alias)', ...); +}); +``` + +| Estimate | ~30 LoC changes, ~25 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.8 — CasError Portability Guard *(P2)* + +**Problem** + +`CasError` calls `Error.captureStackTrace(this, this.constructor)` unconditionally. This is V8-specific — it's a no-op on Bun's JavaScriptCore engine. While it doesn't crash (JSC silently ignores it), it indicates incomplete multi-runtime awareness. + +**Fix** + +Guard the call: `if (Error.captureStackTrace) Error.captureStackTrace(this, this.constructor);` + +**Files:** +- `src/domain/errors/CasError.js` + +**Tests:** +```js +describe('16.8: CasError multi-runtime portability', () => { + it('creates CasError with code and meta', ...); + it('does not throw when Error.captureStackTrace is unavailable', ...); +}); +``` + +| Estimate | ~3 LoC changes, ~10 LoC tests, ~0.5h | +|----------|---------------------------------------| + +--- + +### 16.9 — Pre-Commit Hook + Hooks Directory *(P2)* + +**Problem** + +The project has a `pre-push` hook but no `pre-commit` hook. Lint failures are not caught until push time. Additionally, the hooks directory is `scripts/git-hooks/` rather than `scripts/hooks/` per the CLAUDE.md convention. + +**Fix** + +1. Rename `scripts/git-hooks/` to `scripts/hooks/`. +2. Update `scripts/install-hooks.sh` to reference the new path. +3. Add `scripts/hooks/pre-commit` that runs `pnpm run lint`. +4. Update `.git/config` hooksPath if already set. + +**Files:** +- `scripts/git-hooks/pre-push` → `scripts/hooks/pre-push` +- New: `scripts/hooks/pre-commit` +- `scripts/install-hooks.sh` + +| Estimate | ~15 LoC, ~0.5h | +|----------|-----------------| + +--- + +### 16.10 — Orphaned Blob Tracking *(P1)* — C2 + +**Problem** + +When `_chunkAndStore()` throws `STREAM_ERROR`, chunks already written to Git are orphaned. The error meta reports `chunksDispatched` but not the blob OIDs of successful writes. There's no visibility into what was orphaned. + +**Fix** + +1. After `Promise.allSettled(pending)`, collect blob OIDs from fulfilled results. +2. Include `orphanedBlobs: string[]` in the `STREAM_ERROR` meta. +3. Emit `observability.metric('error', { action: 'orphaned_blobs', count, blobs })`. + +**Files:** +- `src/domain/services/CasService.js` + +**Tests:** +```js +describe('16.10: Orphaned blob tracking on STREAM_ERROR', () => { + it('includes orphanedBlobs array in STREAM_ERROR meta', ...); + it('orphanedBlobs contains blob OIDs from successful writes before failure', ...); + it('orphanedBlobs is empty when stream fails before any writes', ...); + it('emits orphaned_blobs metric via observability', ...); +}); +``` + +| Estimate | ~20 LoC changes, ~30 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### 16.11 — Passphrase Input Security *(P0)* — C5 + V6 + +**Problem** + +`--vault-passphrase ` puts the passphrase in shell history and process listings. The `GIT_CAS_PASSPHRASE` env var is better but still visible in `/proc//environ`. + +**Fix** + +1. **Interactive prompt**: When `--vault-passphrase` is passed without a value and stdin is a TTY, prompt with echo disabled. Confirmation on first use (store/init). +2. **File-based input**: Add `--vault-passphrase-file ` flag that reads from a file. +3. **Stdin pipe**: `--vault-passphrase -` reads from stdin. +4. **Documentation**: Security warning in `--help` and README. + +**Files:** +- `bin/git-cas.js` +- New: `bin/ui/passphrase-prompt.js` + +**Tests:** +```js +describe('16.11: Passphrase input security', () => { + it('reads passphrase from file when --vault-passphrase-file is used', ...); + it('errors when no passphrase source is available in non-TTY mode', ...); + it('--vault-passphrase-file trims trailing newline', ...); +}); +``` + +| Estimate | ~90 LoC, ~30 LoC tests, ~4h | +|----------|------------------------------| + +--- + +### 16.12 — KDF Brute-Force Awareness *(P2)* — C6 + +**Problem** + +`deriveKey()` and the restore path have no rate limiting or audit trail. An attacker can brute-force passphrases at full CPU speed. + +**Fix** + +1. Emit `observability.metric('error', { action: 'decryption_failed', slug })` on every `INTEGRITY_ERROR` during passphrase-based restore. +2. In the CLI layer, add a 1-second delay after each failed passphrase attempt. + +**Files:** +- `src/domain/services/CasService.js` (observability metric) +- `bin/git-cas.js` (CLI delay) + +**Tests:** +```js +describe('16.12: KDF brute-force awareness', () => { + it('emits decryption_failed metric on wrong passphrase', ...); + it('emits metric with slug context for audit trail', ...); + it('library API does NOT rate-limit (callers manage their own policy)', ...); +}); +``` + +| Estimate | ~10 LoC changes, ~20 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.13 — GCM Nonce Collision Documentation *(P2)* — C7 + +**Problem** + +AES-256-GCM uses a 96-bit random nonce. Birthday bound is ~2^48; NIST recommends limiting to 2^32 invocations per key. There's no tracking, no warning, and no documentation of the bound. + +**Fix** + +1. Add `SECURITY.md` at project root documenting: GCM nonce bound, recommended key rotation frequency, KDF parameter guidance, passphrase entropy recommendations. +2. Add `encryptionCount` field to vault metadata. Increment per `store()` with encryption. Emit observability warning when count exceeds 2^31. + +**Files:** +- New: `SECURITY.md` +- `src/domain/services/VaultService.js` (counter increment) + +**Tests:** +```js +describe('16.13: Nonce usage tracking', () => { + it('vault metadata includes encryptionCount after encrypted store', ...); + it('encryptionCount increments per encrypted store', ...); + it('warns via observability when encryptionCount exceeds threshold', ...); +}); +``` + +| Estimate | ~25 LoC changes, ~20 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### M16 Summary + +| Task | Theme | Priority | Severity | Audit Ref | Concern Ref | ~LoC | ~Hours | +|------|-------|----------|----------|-----------|-------------|------|--------| +| 16.1 | Crypto adapter normalization | P0 | High | Flaw 1 | C8 | ~150 | ~4h | +| 16.2 | Memory restore guard | P0 | High | Flaw 2 | C1 | ~65 | ~2h | +| 16.3 | Web Crypto buffer guard | P1 | Medium | Flaw 3 | C4 | ~45 | ~1h | +| 16.4 | FixedChunker buffer optimization | P2 | Low | Flaw 4 | C9 | ~35 | ~1h | +| 16.5 | Encrypt-then-chunk dedup warning | P1 | Medium | Flaw 5 | C10 | ~30 | ~1h | +| 16.6 | Chunk size upper bound | P1 | Medium | Flaw 6 | C3 | ~45 | ~1h | +| 16.7 | Lifecycle method naming | P2 | Low | Flaw 7 | — | ~55 | ~1h | +| 16.8 | CasError portability guard | P2 | Negligible | Flaw 8 | — | ~13 | ~0.5h | +| 16.9 | Pre-commit hook + hooks dir | P2 | Low | Flaw 9 | — | ~15 | ~0.5h | +| 16.10 | Orphaned blob tracking | P1 | Medium | — | C2 | ~50 | ~2h | +| 16.11 | Passphrase input security | P0 | High | — | C5+V6 | ~120 | ~4h | +| 16.12 | KDF brute-force awareness | P2 | Low | — | C6 | ~30 | ~1h | +| 16.13 | GCM nonce collision docs + counter | P2 | Low | — | C7 | ~45 | ~2h | +| **Total** | | | | | | **~698** | **~21h** | + +### Recommended Execution Order + +**Phase 1 — Safety nets (P0):** +16.8, 16.9, 16.1, 16.2, 16.11 + +**Phase 2 — Correctness (P1):** +16.6, 16.3, 16.5, 16.10 + +**Phase 3 — Polish (P2):** +16.4, 16.7, 16.12, 16.13 + +--- + # 7) Feature Matrix Competitive landscape for content-addressed storage, encrypted binary assets, and large-file Git tooling. Rows represent the union of features across the space — not just what git-cas offers, but what users encounter and expect when evaluating tools in this category. @@ -653,6 +1098,9 @@ Consistency and DRY fixes surfaced by architecture audit. No new features, no AP Ideas for future milestones. Not committed, not prioritized — just captured. +### CLI Parity *(recently shipped)* +All library-level configuration is now accessible from the CLI: `--gzip`, `--strategy`, `--chunk-size`, `--concurrency`, `--codec`, `--merkle-threshold`, CDC parameters, `--max-restore-buffer`. Project config file (`.casrc`) provides repository-level defaults. See V9–V12 for remaining CLI gaps. + ### Named Vaults Multiple vaults instead of one. Refs move from `refs/cas/vault` to `refs/cas/vaults/`. Default vault is `default`. CLI gets `--vault ` flag. @@ -663,7 +1111,8 @@ Multiple vaults instead of one. Refs move from `refs/cas/vault` to `refs/cas/vau ### Vault Management - **Move into vault** — `git cas vault add --slug --oid ` to adopt an existing CAS tree into the vault (the API `addToVault()` already supports this; just needs a CLI command). -- **Purge from CAS** — remove an entry from the vault and run `git gc` to reclaim storage. Tricky because git doesn't delete individual objects — you remove refs and let GC handle it. +- **Vault status** — `git cas vault status` shows metadata, `encryptionCount`, entry count, nonce health. See V9. +- **Purge from CAS** — remove an entry from the vault and run `git gc` to reclaim storage. See V10. ### Publish / Mount - **Publish to working tree** — `git cas publish --slug assets/hero --to docs/hero.gif` reconstitutes a vault entry into the repo's working tree so it's servable by GitHub (markdown images, Pages, etc.). @@ -897,30 +1346,96 @@ Zstd alone would give 5-10x faster compression with equal or better ratio. For a --- -## Vision 6: Interactive Passphrase Prompt +## Vision 6: Interactive Passphrase Prompt ✅ DONE + +**Status:** Implemented by Task 16.11. See `bin/ui/passphrase-prompt.js`. + +Passphrase resolution priority: `--vault-passphrase-file` → `--vault-passphrase` → `GIT_CAS_PASSPHRASE` → interactive TTY prompt. Confirmation prompt on first use (vault init). File permission warnings on group/world-readable passphrase files. CRLF normalization for Windows compatibility. + +--- + +## Vision 9: Vault Status Command **The Pitch** -Replace `--vault-passphrase "my secret"` (visible in shell history, `ps` output, and CI logs) with an interactive TTY prompt that reads the passphrase from stdin with echo disabled. Like `gpg`, `ssh-keygen`, and `sudo`. +`git cas vault status` — a single command to show vault health: entry count, encryption state, `encryptionCount` (nonce usage), KDF parameters, and nonce health assessment. The `encryptionCount` and `decryption_failed` metrics from M16 are already tracked in vault metadata but have no CLI surface. ```shell -$ git cas store ./secrets.tar.gz --slug prod-secrets --vault-passphrase -Enter vault passphrase: •••••••••• -Confirm passphrase: •••••••••• +$ git cas vault status +Entries: 42 +Encryption: aes-256-gcm (pbkdf2, 600000 iterations) +Encryption count: 1,247 / 2,147,483,648 (0.00%) +Nonce health: ✅ Safe (rotate key before 2^31) ``` -Falls back to `GIT_CAS_PASSPHRASE` env var for non-interactive contexts (CI). The flag `--vault-passphrase` without a value triggers the prompt; with a value, uses it directly (backward compatible). +| Phase | Work | ~LoC | ~Hours | +|-------|------|------|--------| +| 1. Command + metadata display | Read vault metadata, format output (text + JSON) | ~40 | ~1h | +| 2. Nonce health assessment | Compare `encryptionCount` against thresholds, emit warning levels | ~15 | ~0.5h | +| 3. Tests | Mock vault state, verify output format | ~20 | ~0.5h | +| **Total** | | **~60** | **~2h** | -**Mini Battle Plan** +--- + +## Vision 10: GC Command + +**The Pitch** + +`git cas gc` — identify unreferenced chunks across vault entries and optionally trigger `git gc`. Wraps `collectReferencedChunks()` with a user-facing report. + +```shell +$ git cas gc --dry-run +Referenced chunks: 1,247 +Unreferenced blobs: 23 (estimated 4.2 MiB) +Run without --dry-run to trigger git gc. +``` + +| Phase | Work | ~LoC | ~Hours | +|-------|------|------|--------| +| 1. Chunk analysis | Call `collectReferencedChunks` for all vault entries, compare against all CAS blobs | ~40 | ~2h | +| 2. `git gc` integration | Optionally invoke `git gc` after analysis. `--dry-run` flag for preview. | ~20 | ~1h | +| 3. Tests + safety | Confirm dry-run default, test output format | ~20 | ~1h | +| **Total** | | **~80** | **~4h** | + +--- + +## Vision 11: KDF Parameter Tuning via `.casrc` + +**The Pitch** + +Allow `.casrc` to specify KDF parameters for vault init and rotation: `kdf.algorithm`, `kdf.iterations` (PBKDF2), `kdf.cost`/`kdf.blockSize`/`kdf.parallelization` (scrypt). Reject insecure values below OWASP minimums (100,000 iterations for PBKDF2-SHA-512, N=8192 for scrypt). | Phase | Work | ~LoC | ~Hours | |-------|------|------|--------| -| 1. TTY reader | `readPassphrase(prompt: string): Promise` — opens `/dev/tty` (Unix) or `CON` (Windows), sets raw mode, reads until Enter, echoes `•` per character. | ~40 | ~2h | -| 2. CLI integration | When `--vault-passphrase` is passed without a value and stdin is a TTY, call `readPassphrase()`. On store (first use), prompt twice for confirmation. | ~20 | ~1h | -| 3. Tests | Mock TTY input, verify echo suppression, verify confirmation match/mismatch, verify env var fallback. | ~30 | ~1h | -| **Total** | | **~90** | **~4h** | +| 1. Config schema + validation | Add `kdf` key to `.casrc` schema, validate against OWASP minimums | ~25 | ~1h | +| 2. Wire into vault init/rotate | `loadConfig` merges KDF params into `kdfOptions` | ~10 | ~0.5h | +| 3. Tests | Reject weak params, merge precedence | ~15 | ~0.5h | +| **Total** | | **~40** | **~2h** | + +--- + +## Vision 12: File-Level Passphrase CLI + +**The Pitch** -**This directly mitigates Concern 5 (shell history exposure) below.** +`--passphrase` flag for standalone encrypted store without requiring vault encryption. The library already supports `passphrase` + `kdfOptions` on `store()`/`storeFile()` — this just wires it through the CLI. + +```shell +# Store with a one-off passphrase (not vault-level) +git cas store ./secrets.bin --slug one-off --passphrase "my secret" + +# Restore with the same passphrase +git cas restore --slug one-off --out ./secrets.bin --passphrase "my secret" +``` + +Mutually exclusive with `--key-file`, `--recipient`, and `--vault-passphrase`. Supports the same resolution chain as vault passphrases: `--passphrase-file`, `--passphrase`, env var `GIT_CAS_FILE_PASSPHRASE`, TTY prompt. + +| Phase | Work | ~LoC | ~Hours | +|-------|------|------|--------| +| 1. CLI flag + store wiring | Add `--passphrase` to store/restore, wire into `storeFile`/`restoreFile` with `passphrase` + `kdfOptions` | ~20 | ~0.5h | +| 2. Mutual exclusion validation | Reject conflicting encryption flags | ~5 | ~0.25h | +| 3. Tests | Store/restore round-trip, conflict validation | ~15 | ~0.5h | +| **Total** | | **~30** | **~1h** | --- @@ -930,7 +1445,9 @@ Architectural and security concerns identified during code review, with proposed --- -## Concern 1: Memory Amplification on Encrypted/Compressed Restore +## Concern 1: Memory Amplification on Encrypted/Compressed Restore ✅ MITIGATED + +**Status:** Task 16.2 implemented `maxRestoreBufferSize` guard (default 512 MiB). Post-decompression guard added. CLI exposes `--max-restore-buffer`. **The Problem** @@ -965,7 +1482,9 @@ describe('Concern 1: Memory guard on encrypted restore', () => { --- -## Concern 2: Orphaned Blob Accumulation After STREAM_ERROR +## Concern 2: Orphaned Blob Accumulation After STREAM_ERROR ✅ MITIGATED + +**Status:** Task 16.10 implemented. `STREAM_ERROR` meta includes `orphanedBlobs` array. Observability metric emitted. **The Problem** @@ -1000,7 +1519,9 @@ describe('Concern 2: Orphaned blob tracking on STREAM_ERROR', () => { --- -## Concern 3: No Upper Bound on Chunk Size +## Concern 3: No Upper Bound on Chunk Size ✅ MITIGATED + +**Status:** Task 16.6 implemented. 100 MiB hard cap on CasService, FixedChunker, CdcChunker. Warning at 10 MiB. **The Problem** @@ -1031,7 +1552,9 @@ describe('Concern 3: Chunk size upper bound', () => { --- -## Concern 4: Web Crypto Adapter Silent Memory Buffering +## Concern 4: Web Crypto Adapter Silent Memory Buffering ✅ MITIGATED + +**Status:** Task 16.3 implemented. `ENCRYPTION_BUFFER_EXCEEDED` thrown when accumulated bytes exceed `maxEncryptionBufferSize` (default 512 MiB). **The Problem** @@ -1064,7 +1587,9 @@ describe('Concern 4: Web Crypto buffering guard', () => { --- -## Concern 5: Passphrase Exposure in Shell History and Process Listings +## Concern 5: Passphrase Exposure in Shell History and Process Listings ✅ MITIGATED + +**Status:** Task 16.11 implemented. `--vault-passphrase-file`, interactive TTY prompt, stdin pipe. See Vision 6. **The Problem** @@ -1100,7 +1625,9 @@ describe('Concern 5: Passphrase input security', () => { --- -## Concern 6: No KDF Brute-Force Rate Limiting +## Concern 6: No KDF Brute-Force Rate Limiting ✅ MITIGATED + +**Status:** Task 16.12 implemented. `decryption_failed` observability metric. CLI 1s delay on `INTEGRITY_ERROR`. **The Problem** @@ -1135,7 +1662,9 @@ describe('Concern 6: KDF brute-force awareness', () => { --- -## Concern 7: GCM Nonce Collision Risk at Scale +## Concern 7: GCM Nonce Collision Risk at Scale ✅ MITIGATED + +**Status:** Task 16.13 implemented. `SECURITY.md` documents GCM bound. Vault tracks `encryptionCount`. Warning at 2^31. **The Problem** @@ -1170,23 +1699,82 @@ describe('Concern 7: Nonce uniqueness', () => { --- +## Concern 8: Crypto Adapter Liskov Substitution Violation ✅ MITIGATED + +**Source:** CODE-EVAL.md, Flaw 1 + +**Status:** Task 16.1 implemented. All adapters: async `encryptBuffer`, `_validateKey` in `decryptBuffer`, `STREAM_NOT_CONSUMED` guard. Conformance test suite. + +**The Problem** + +The three `CryptoPort` implementations (Node, Bun, Web) differ in observable behavior: + +1. `NodeCryptoAdapter.encryptBuffer()` is synchronous (returns plain object), while Bun and Web return `Promise`. +2. `BunCryptoAdapter.decryptBuffer()` calls `_validateKey(key)` before decryption; Node and Web do not — the invalid key hits `node:crypto` directly, producing a less informative error. +3. `NodeCryptoAdapter.createEncryptionStream()` has no premature-finalize guard. Calling `finalize()` before consuming the stream returns garbage metadata on Node, but throws a clear `CasError('STREAM_NOT_CONSUMED')` on Bun and Deno. + +M15 Prism fixed the `sha256()` async inconsistency but left these three discrepancies untouched. + +**Mitigation:** Task 16.1. + +--- + +## Concern 9: FixedChunker Quadratic Buffer Allocation ✅ MITIGATED + +**Source:** CODE-EVAL.md, Flaw 4 + +**Status:** Task 16.4 implemented. Pre-allocated `Buffer.allocUnsafe(chunkSize)` working buffer. + +**The Problem** + +`FixedChunker.chunk()` uses `Buffer.concat([buffer, data])` inside its async loop. Each call allocates a new buffer and copies the accumulated bytes. For a source yielding many small buffers (e.g., 4 KiB network reads into a 256 KiB chunk), this is O(n^2 / chunkSize) total byte copies. The CdcChunker, by contrast, uses a pre-allocated `Buffer.allocUnsafe(maxChunkSize)` with zero intermediate copies. + +**Mitigation:** Task 16.4. + +--- + +## Concern 10: CDC Deduplication Defeated by Encrypt-Then-Chunk ✅ MITIGATED + +**Source:** CODE-EVAL.md, Flaw 5 + +**Status:** Task 16.5 implemented. Runtime warning when encryption + CDC combined. + +**The Problem** + +Encryption is applied to the source stream *before* chunking. AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext (different random nonce each time). This means content-defined chunking (CDC) provides **zero deduplication benefit** for encrypted files. Users who combine `recipients` (or `encryptionKey`) with `chunking: { strategy: 'cdc' }` get CDC's computational overhead without its primary value proposition. + +This is a fundamental architectural constraint of the encrypt-then-chunk design. The alternative (chunk-then-encrypt) would require per-chunk nonces and auth tags, significantly complicating the manifest schema. This is documented as a known limitation, not a fixable bug. + +**Mitigation:** Task 16.5 (runtime warning + documentation). + +--- + ## Summary Table -| # | Type | Severity | Fix Cost | Recommended Action | -|---|------|----------|----------|-------------------| -| C1 | Memory amplification | High | ~20 LoC | Add `maxRestoreBufferSize` guard | -| C2 | Orphaned blobs | Medium | ~20 LoC | Report orphaned blob OIDs in error meta | -| C3 | No chunk size cap | Medium | ~6 LoC | Enforce 100 MiB maximum | -| C4 | Web Crypto buffering | Medium | ~15 LoC | Add buffer size guard in WebCryptoAdapter | -| C5 | Passphrase exposure | High | ~90 LoC | Interactive prompt + file-based input | -| C6 | KDF no rate limit | Low | ~10 LoC | Observability metric + CLI delay | -| C7 | GCM nonce collision | Low | ~20 LoC | Document bound + vault usage counter | - -| # | Type | Theme | Est. Cost | -|---|------|-------|-----------| -| V1 | Feature | Snapshot trees (directory store) | ~410 LoC, ~19h | -| V2 | Feature | Portable bundles (air-gap transfer) | ~340 LoC, ~15h | -| V3 | Feature | Manifest diff engine | ~180 LoC, ~8h | -| V4 | Feature | CompressionPort + zstd/brotli/lz4 | ~180 LoC, ~8h | -| V5 | Feature | Watch mode (continuous sync) | ~220 LoC, ~10h | -| V6 | Feature | Interactive passphrase prompt | ~90 LoC, ~4h | +| # | Type | Severity | Fix Cost | Recommended Action | Task | Status | +|---|------|----------|----------|--------------------|------|--------| +| C1 | Memory amplification | High | ~20 LoC | Add `maxRestoreBufferSize` guard | **16.2** | ✅ Done | +| C2 | Orphaned blobs | Medium | ~20 LoC | Report orphaned blob OIDs in error meta | **16.10** | ✅ Done | +| C3 | No chunk size cap | Medium | ~6 LoC | Enforce 100 MiB maximum | **16.6** | ✅ Done | +| C4 | Web Crypto buffering | Medium | ~15 LoC | Add buffer size guard in WebCryptoAdapter | **16.3** | ✅ Done | +| C5 | Passphrase exposure | High | ~90 LoC | Interactive prompt + file-based input | **16.11** | ✅ Done | +| C6 | KDF no rate limit | Low | ~10 LoC | Observability metric + CLI delay | **16.12** | ✅ Done | +| C7 | GCM nonce collision | Low | ~20 LoC | Document bound + vault usage counter | **16.13** | ✅ Done | +| C8 | Crypto adapter LSP violation | Medium | ~50 LoC | Normalize validation + finalize guards | **16.1** | ✅ Done | +| C9 | FixedChunker quadratic alloc | Low | ~20 LoC | Pre-allocated buffer | **16.4** | ✅ Done | +| C10 | Encrypt-then-chunk dedup loss | Medium | ~10 LoC | Runtime warning + documentation | **16.5** | ✅ Done | + +| # | Type | Theme | Est. Cost | Status | +|---|------|-------|-----------|--------| +| V1 | Feature | Snapshot trees (directory store) | ~410 LoC, ~19h | 🔲 Open | +| V2 | Feature | Portable bundles (air-gap transfer) | ~340 LoC, ~15h | 🔲 Open | +| V3 | Feature | Manifest diff engine | ~180 LoC, ~8h | 🔲 Open | +| V4 | Feature | CompressionPort + zstd/brotli/lz4 | ~180 LoC, ~8h | 🔲 Open | +| V5 | Feature | Watch mode (continuous sync) | ~220 LoC, ~10h | 🔲 Open | +| V6 | Feature | Interactive passphrase prompt | ~90 LoC, ~4h | ✅ Done — subsumed by **16.11** | +| V7 | Feature | Prometheus/OpenTelemetry ObservabilityPort adapter | ~150 LoC, ~6h | 🔲 Open | +| V8 | Feature | `encryptionCount` auto-rotation | ~120 LoC, ~5h | 🔲 Open | +| V9 | Feature | `vault status` command — show metadata, `encryptionCount`, entry count, nonce health | ~60 LoC, ~2h | 🔲 Open | +| V10 | Feature | `gc` command — `collectReferencedChunks` + `git gc` for orphan cleanup | ~80 LoC, ~4h | 🔲 Open | +| V11 | Feature | KDF parameter tuning via `.casrc` — `kdf.iterations`, `kdf.cost`, `kdf.blockSize`, `kdf.parallelization` with validation (reject insecure values below OWASP minimums) | ~40 LoC, ~2h | 🔲 Open | +| V12 | Feature | File-level passphrase CLI — `--passphrase` flag for standalone encrypted store without vault encryption. Library already supports `passphrase` + `kdfOptions` on `store()`/`storeFile()`. | ~30 LoC, ~1h | 🔲 Open | diff --git a/docs/SECURITY.md b/SECURITY.md similarity index 85% rename from docs/SECURITY.md rename to SECURITY.md index b626c26..12fd81d 100644 --- a/docs/SECURITY.md +++ b/SECURITY.md @@ -4,15 +4,50 @@ This document describes the security architecture, cryptographic design, and lim ## Table of Contents -1. [Threat Model](#threat-model) -2. [Cryptographic Design](#cryptographic-design) -3. [Key Handling](#key-handling) -4. [Encryption Flow](#encryption-flow) -5. [Decryption Flow](#decryption-flow) -6. [Chunk Digest Verification](#chunk-digest-verification) -7. [Limitations](#limitations) -8. [Git Object Immutability](#git-object-immutability) -9. [Error Codes for Security Operations](#error-codes-for-security-operations) +1. [Operational Limits](#operational-limits) +2. [Threat Model](#threat-model) +3. [Cryptographic Design](#cryptographic-design) +4. [Key Handling](#key-handling) +5. [Encryption Flow](#encryption-flow) +6. [Decryption Flow](#decryption-flow) +7. [Chunk Digest Verification](#chunk-digest-verification) +8. [Limitations](#limitations) +9. [Git Object Immutability](#git-object-immutability) +10. [Error Codes for Security Operations](#error-codes-for-security-operations) + +--- + +## Operational Limits + +### GCM Nonce Bound + +AES-256-GCM uses a 96-bit random nonce per encryption. NIST SP 800-38D recommends limiting to **2^32 invocations per key** to keep the nonce collision probability below an acceptable threshold. The birthday bound is approximately 2^48 for random 96-bit nonces, but the conservative NIST guidance of 2^32 accounts for the catastrophic consequences of a collision (full plaintext and authentication key recovery). + +git-cas tracks encryption operations via `encryptionCount` in vault metadata. When the count exceeds **2^31** (2,147,483,648), an observability warning is emitted, providing a safety margin before the 2^32 NIST limit. + +**Recommended key rotation frequency**: Rotate the vault passphrase (or encryption key) before `encryptionCount` reaches 2^31, or every 90 days, whichever comes first. + +### KDF Parameter Guidance + +When using passphrase-based encryption, git-cas derives keys using PBKDF2 or scrypt. + +| Algorithm | Recommended Parameters | Notes | +|-----------|----------------------|-------| +| PBKDF2 | iterations ≥ 600,000 (SHA-256) | OWASP 2024 recommendation | +| scrypt | N=2^17, r=8, p=1 | ~128 MiB memory | + +Higher iteration counts / cost parameters increase resistance to brute-force attacks but also increase the time to derive a key. Choose parameters based on your threat model and latency tolerance. + +### Passphrase Entropy Recommendations + +| Entropy (bits) | Example | Brute-Force Resistance | +|---------------|---------|----------------------| +| < 40 | `password123` | Trivially crackable | +| 40–60 | 4–5 random dictionary words | Weak against GPU attacks | +| 60–80 | 6+ random dictionary words or 12+ mixed characters | Moderate | +| > 80 | 8+ random dictionary words or 16+ mixed characters | Strong | + +**Minimum recommendation**: 80+ bits of entropy for vault passphrases. Use a random passphrase generator (e.g., Diceware) rather than human-chosen passwords. --- @@ -600,6 +635,55 @@ throw new CasError( - Verify the encryption key is available and passed to `restore()`. - If the key is lost, the content is permanently inaccessible. +### `RESTORE_TOO_LARGE` + +**Thrown when**: +- An encrypted or compressed restore would exceed the configured `maxRestoreBufferSize` limit. +- The post-decompression size exceeds the limit (checked after gunzip). + +**Example**: +```javascript +throw new CasError( + 'Restore buffer exceeds limit', + 'RESTORE_TOO_LARGE', + { size: 1073741824, limit: 536870912 }, +); +``` + +**Possible causes**: +- The asset is larger than the configured buffer limit (default 512 MiB). +- A compressed asset inflates beyond the limit after decompression. + +**Recommended action**: +- Increase `maxRestoreBufferSize` in the `CasService` constructor or `.casrc`. +- For very large assets, consider storing without encryption to enable streaming restore. + +--- + +### `ENCRYPTION_BUFFER_EXCEEDED` + +**Thrown when**: +- Web Crypto AES-GCM encryption is attempted on data exceeding the configured `maxEncryptionBufferSize`. +- Web Crypto is a one-shot API — it cannot stream, so the entire plaintext must fit in memory. + +**Example**: +```javascript +throw new CasError( + 'Streaming encryption buffered 1073741824 bytes (limit: 536870912)...', + 'ENCRYPTION_BUFFER_EXCEEDED', + { accumulated: 1073741824, limit: 536870912 }, +); +``` + +**Possible causes**: +- Large chunks combined with `WebCryptoAdapter` (used in Bun/Deno). +- `NodeCryptoAdapter` uses true streaming and is not affected by this limit. + +**Recommended action**: +- Increase `maxEncryptionBufferSize` in the `WebCryptoAdapter` constructor. +- Switch to `NodeCryptoAdapter` if streaming encryption is needed. +- Split the asset before storing, or store without encryption on the Web Crypto path for very large files. + --- ## Conclusion diff --git a/bin/actions.js b/bin/actions.js index d1cb54a..2a28ce4 100644 --- a/bin/actions.js +++ b/bin/actions.js @@ -56,18 +56,31 @@ function getHint(code) { return undefined; } +/** + * Default delay — real setTimeout for production use. + * @param {number} ms + * @returns {Promise} + */ +function defaultDelay(ms) { + return new Promise((resolve) => { setTimeout(resolve, ms); }); +} + /** * Wrap a command action with structured error handling. * * @param {(...args: any[]) => Promise} fn - The async action function. * @param {() => boolean} getJson - Lazy getter for --json flag value. + * @param {{ delay?: (ms: number) => Promise }} [options] - Injectable dependencies. * @returns {(...args: any[]) => Promise} Wrapped action. */ -export function runAction(fn, getJson) { +export function runAction(fn, getJson, { delay = defaultDelay } = {}) { return async (/** @type {any[]} */ ...args) => { try { await fn(...args); } catch (/** @type {any} */ err) { + if (err?.code === 'INTEGRITY_ERROR') { + await delay(1000); + } writeError(err, getJson()); process.exitCode = 1; } diff --git a/bin/config.js b/bin/config.js new file mode 100644 index 0000000..e92ef3a --- /dev/null +++ b/bin/config.js @@ -0,0 +1,203 @@ +/** + * @fileoverview Loads `.casrc` project config from the Git working directory. + * + * `.casrc` is a JSON file placed at the repository root that provides default + * values for CLI flags. CLI flags always take precedence over `.casrc` values. + * + * Supported keys: + * chunkSize — Chunk size in bytes (integer >= 1024, default 262144) + * strategy — Chunking strategy: "fixed" or "cdc" (default "fixed") + * concurrency — Parallel chunk I/O operations (positive integer, default 1) + * codec — Manifest codec: "json" or "cbor" (default "json") + * compression — Compression algorithm: "gzip" or false (default false) + * merkleThreshold — Chunk count threshold for Merkle sub-manifests (default 1000) + * maxRestoreBufferSize — Max bytes for buffered restore (default 536870912) + * cdc.minChunkSize — CDC minimum chunk size + * cdc.targetChunkSize — CDC target chunk size + * cdc.maxChunkSize — CDC maximum chunk size + */ + +import { readFileSync } from 'node:fs'; +import { resolve } from 'node:path'; + +const FILENAME = '.casrc'; +const MAX_CHUNK_SIZE = 100 * 1024 * 1024; + +/** + * @typedef {Object} CasConfig + * @property {number} [chunkSize] + * @property {string} [strategy] + * @property {number} [concurrency] + * @property {string} [codec] + * @property {string|false} [compression] + * @property {number} [merkleThreshold] + * @property {number} [maxRestoreBufferSize] + * @property {{ minChunkSize?: number, targetChunkSize?: number, maxChunkSize?: number }} [cdc] + */ + +/** + * @param {any} value + * @param {string} name + * @param {{ min: number, max?: number }} range + */ +function assertInt(value, name, { min, max }) { + if (value === undefined) { return; } + if (!Number.isInteger(value) || value < min) { + throw new Error(`${FILENAME}: ${name} must be an integer >= ${min}`); + } + if (max !== undefined && value > max) { + throw new Error(`${FILENAME}: ${name} must not exceed ${max}`); + } +} + +/** + * @param {any} value + * @param {string} name + * @param {string[]} allowed + */ +function assertEnum(value, name, allowed) { + if (value === undefined) { return; } + if (!allowed.includes(value)) { + throw new Error(`${FILENAME}: ${name} must be ${allowed.map((v) => `"${v}"`).join(' or ')}`); + } +} + +/** + * @param {Record} config + */ +/** + * @param {{ minChunkSize?: number, targetChunkSize?: number, maxChunkSize?: number }} cdc + */ +function assertCdcOrdering(cdc) { + const { minChunkSize, targetChunkSize, maxChunkSize } = cdc; + if (minChunkSize !== undefined && maxChunkSize !== undefined && minChunkSize > maxChunkSize) { + throw new Error(`${FILENAME}: cdc.minChunkSize must not exceed cdc.maxChunkSize`); + } + if (targetChunkSize !== undefined && minChunkSize !== undefined && targetChunkSize < minChunkSize) { + throw new Error(`${FILENAME}: cdc.targetChunkSize must be >= cdc.minChunkSize`); + } + if (targetChunkSize !== undefined && maxChunkSize !== undefined && targetChunkSize > maxChunkSize) { + throw new Error(`${FILENAME}: cdc.targetChunkSize must be <= cdc.maxChunkSize`); + } +} + +/** + * @param {Record} config + */ +function validateCdc(config) { + if (config.cdc === undefined) { return; } + if (typeof config.cdc !== 'object' || config.cdc === null || Array.isArray(config.cdc)) { + throw new Error(`${FILENAME}: cdc must be an object`); + } + for (const key of ['minChunkSize', 'targetChunkSize', 'maxChunkSize']) { + assertInt(config.cdc[key], `cdc.${key}`, { min: 1, max: MAX_CHUNK_SIZE }); + } + assertCdcOrdering(config.cdc); +} + +/** + * Validates `.casrc` config values after parsing. + * + * @param {Record} config + */ +function validateConfig(config) { + assertInt(config.chunkSize, 'chunkSize', { min: 1024, max: MAX_CHUNK_SIZE }); + assertEnum(config.strategy, 'strategy', ['fixed', 'cdc']); + assertInt(config.concurrency, 'concurrency', { min: 1 }); + assertEnum(config.codec, 'codec', ['json', 'cbor']); + if (config.compression !== undefined && config.compression !== false) { + assertEnum(config.compression, 'compression', ['gzip']); + } + assertInt(config.merkleThreshold, 'merkleThreshold', { min: 1 }); + assertInt(config.maxRestoreBufferSize, 'maxRestoreBufferSize', { min: 1024 }); + validateCdc(config); +} + +/** + * Loads `.casrc` from the given directory, returning an empty object if not found. + * + * @param {string} cwd - Directory to search for `.casrc`. + * @returns {CasConfig} + */ +export function loadConfig(cwd) { + const filePath = resolve(cwd, FILENAME); + try { + const raw = readFileSync(filePath, 'utf8'); + const config = JSON.parse(raw); + if (typeof config !== 'object' || config === null || Array.isArray(config)) { + throw new Error(`${FILENAME}: expected a JSON object`); + } + validateConfig(config); + return config; + } catch (err) { + if (err.code === 'ENOENT') { + return {}; + } + if (err instanceof SyntaxError) { + throw new Error(`${FILENAME}: invalid JSON — ${err.message}`); + } + throw err; + } +} + +/** + * Sets key on target if value is not undefined. + * @param {Record} target + * @param {string} key + * @param {any} value + */ +function setIfDefined(target, key, value) { + if (value !== undefined) { target[key] = value; } +} + +/** + * Resolves chunking config from merged CLI + config values. + * @param {{ strategy?: string, chunkSize?: number, cliOpts: Record, config: CasConfig }} opts + * @returns {Record|undefined} + */ +function resolveChunking({ strategy, chunkSize, cliOpts, config }) { + if (strategy === 'cdc') { + const cdcConf = config.cdc || {}; + return { + strategy: 'cdc', + targetChunkSize: cliOpts.targetChunkSize ?? cdcConf.targetChunkSize, + minChunkSize: cliOpts.minChunkSize ?? cdcConf.minChunkSize, + maxChunkSize: cliOpts.maxChunkSize ?? cdcConf.maxChunkSize, + }; + } + if (strategy === 'fixed' && chunkSize !== undefined) { + return { strategy: 'fixed', chunkSize }; + } + return undefined; +} + +/** + * Merges CLI options over `.casrc` defaults. CLI flags take precedence. + * + * @param {Record} cliOpts - Parsed CLI options. + * @param {CasConfig} config - Loaded `.casrc` config. + * @returns {{ casConfig: Record, storeExtras: Record }} + */ +export function mergeConfig(cliOpts, config) { + const strategy = cliOpts.strategy ?? config.strategy; + const chunkSize = cliOpts.chunkSize ?? config.chunkSize; + + /** @type {Record} */ + const casConfig = {}; + setIfDefined(casConfig, 'concurrency', cliOpts.concurrency ?? config.concurrency); + setIfDefined(casConfig, 'chunkSize', chunkSize); + setIfDefined(casConfig, 'merkleThreshold', cliOpts.merkleThreshold ?? config.merkleThreshold); + setIfDefined(casConfig, 'maxRestoreBufferSize', cliOpts.maxRestoreBufferSize ?? config.maxRestoreBufferSize); + setIfDefined(casConfig, 'chunking', resolveChunking({ strategy, chunkSize, cliOpts, config })); + + const codec = cliOpts.codec ?? config.codec; + if (codec === 'cbor') { casConfig.codec = 'cbor'; } + + /** @type {Record} */ + const storeExtras = {}; + if (cliOpts.gzip || config.compression === 'gzip') { + storeExtras.compression = { algorithm: 'gzip' }; + } + + return { casConfig, storeExtras }; +} diff --git a/bin/git-cas.js b/bin/git-cas.js index 5e0f30d..d034386 100755 --- a/bin/git-cas.js +++ b/bin/git-cas.js @@ -1,9 +1,9 @@ #!/usr/bin/env node import { readFileSync } from 'node:fs'; -import { program } from 'commander'; +import { program, Option } from 'commander'; import GitPlumbing, { ShellRunnerFactory } from '@git-stunts/plumbing'; -import ContentAddressableStore, { EventEmitterObserver } from '../index.js'; +import ContentAddressableStore, { EventEmitterObserver, CborCodec } from '../index.js'; import Manifest from '../src/domain/value-objects/Manifest.js'; import { createStoreProgress, createRestoreProgress } from './ui/progress.js'; import { renderEncryptionCard } from './ui/encryption-card.js'; @@ -12,6 +12,8 @@ import { renderManifestView } from './ui/manifest-view.js'; import { renderHeatmap } from './ui/heatmap.js'; import { runAction } from './actions.js'; import { filterEntries, formatTable, formatTabSeparated } from './ui/vault-list.js'; +import { readPassphraseFile, promptPassphrase } from './ui/passphrase-prompt.js'; +import { loadConfig, mergeConfig } from './config.js'; const getJson = () => program.opts().json; @@ -37,16 +39,21 @@ function readKeyFile(keyFilePath) { } /** - * Create a CAS instance for the given working directory with an optional observability adapter. + * Create a CAS instance for the given working directory. * * @param {string} cwd - * @param {{ observability?: import('../index.js').ObservabilityPort }} [opts] + * @param {Record} [opts] * @returns {ContentAddressableStore} */ function createCas(cwd, opts = {}) { const runner = ShellRunnerFactory.create(); const plumbing = new GitPlumbing({ runner, cwd }); - return new ContentAddressableStore({ plumbing, observability: opts.observability }); + /** @type {Record} */ + const casOpts = { plumbing, ...opts }; + if (casOpts.codec === 'cbor') { + casOpts.codec = new CborCodec(); + } + return new ContentAddressableStore(casOpts); } /** @@ -75,13 +82,43 @@ async function deriveVaultKey(cas, metadata, passphrase) { } /** - * Resolve passphrase from --vault-passphrase flag or GIT_CAS_PASSPHRASE env var. + * Returns true when a non-interactive passphrase source exists (flag or env). + * Does NOT trigger prompts or consume stdin. * * @param {Record} opts - * @returns {string | undefined} + * @returns {boolean} */ -function resolvePassphrase(opts) { - return opts.vaultPassphrase ?? process.env.GIT_CAS_PASSPHRASE; +function hasPassphraseSource(opts) { + return Boolean(opts.vaultPassphraseFile || opts.vaultPassphrase || process.env.GIT_CAS_PASSPHRASE); +} + +/** + * Resolve passphrase from (in priority order): + * 1. --vault-passphrase-file + * 2. --vault-passphrase + * 3. GIT_CAS_PASSPHRASE env var + * 4. Interactive TTY prompt (if stdin is a TTY) + * + * @param {Record} opts + * @param {{ confirm?: boolean }} [extra] + * @returns {Promise} + */ +async function resolvePassphrase(opts, extra = {}) { + if (opts.vaultPassphraseFile) { + return await readPassphraseFile(opts.vaultPassphraseFile); + } + if (opts.vaultPassphrase) { + if (!opts.vaultPassphrase.trim()) { throw new Error('Passphrase must not be empty'); } + return opts.vaultPassphrase; + } + if (process.env.GIT_CAS_PASSPHRASE) { + if (!process.env.GIT_CAS_PASSPHRASE.trim()) { throw new Error('Passphrase must not be empty'); } + return process.env.GIT_CAS_PASSPHRASE; + } + if (process.stdin.isTTY) { + return await promptPassphrase({ confirm: extra.confirm || false }); + } + return undefined; } /** @@ -95,16 +132,18 @@ async function resolveEncryptionKey(cas, opts) { if (opts.keyFile) { return readKeyFile(opts.keyFile); } - const passphrase = resolvePassphrase(opts); - if (!passphrase) { + const metadata = await cas.getVaultMetadata(); + if (!metadata?.encryption) { + if (hasPassphraseSource(opts)) { + process.stderr.write('warning: passphrase ignored (vault is not encrypted)\n'); + } return undefined; } - const metadata = await cas.getVaultMetadata(); - if (metadata?.encryption) { - return deriveVaultKey(cas, metadata, passphrase); + const passphrase = await resolvePassphrase(opts); + if (!passphrase) { + return undefined; } - process.stderr.write('warning: passphrase ignored (vault is not encrypted)\n'); - return undefined; + return deriveVaultKey(cas, metadata, passphrase); } /** @@ -177,6 +216,14 @@ function parseRecipient(value, previous) { return list; } +/** @param {string} v */ +const parseIntFlag = (v) => { + if (!/^-?\d+$/.test(v)) { throw new Error(`Expected an integer, got "${v}"`); } + const n = Number(v); + if (!Number.isSafeInteger(n)) { throw new Error(`Expected a safe integer, got "${v}"`); } + return n; +}; + program .command('store ') .description('Store a file into Git CAS') @@ -186,10 +233,20 @@ program .option('--tree', 'Also create a Git tree and print its OID') .option('--force', 'Overwrite existing vault entry') .option('--vault-passphrase ', 'Vault-level passphrase for encryption (prefer GIT_CAS_PASSPHRASE env var)') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') + .option('--gzip', 'Enable gzip compression') + .addOption(new Option('--strategy ', 'Chunking strategy').choices(['fixed', 'cdc'])) + .option('--chunk-size ', 'Chunk size in bytes', parseIntFlag) + .option('--concurrency ', 'Parallel chunk I/O operations', parseIntFlag) + .addOption(new Option('--codec ', 'Manifest codec').choices(['json', 'cbor'])) + .option('--target-chunk-size ', 'CDC target chunk size', parseIntFlag) + .option('--min-chunk-size ', 'CDC minimum chunk size', parseIntFlag) + .option('--max-chunk-size ', 'CDC maximum chunk size', parseIntFlag) + .option('--merkle-threshold ', 'Chunk count threshold for Merkle sub-manifests', parseIntFlag) .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {string} */ file, /** @type {Record} */ opts) => { - if (opts.recipient && (opts.keyFile || resolvePassphrase(opts))) { - throw new Error('Provide --key-file/--vault-passphrase or --recipient, not both'); + if (opts.recipient && (opts.keyFile || hasPassphraseSource(opts))) { + throw new Error('Provide --key-file or a vault passphrase source (--vault-passphrase, --vault-passphrase-file, GIT_CAS_PASSPHRASE), or --recipient — not both'); } if (opts.force && !opts.tree) { throw new Error('--force requires --tree'); @@ -197,9 +254,13 @@ program const json = program.opts().json; const quiet = program.opts().quiet || json; const observer = new EventEmitterObserver(); - const cas = createCas(opts.cwd, { observability: observer }); + + const config = loadConfig(opts.cwd); + const { casConfig, storeExtras } = mergeConfig(opts, config); + const cas = createCas(opts.cwd, { observability: observer, ...casConfig }); const storeOpts = await buildStoreOpts(cas, file, opts); + Object.assign(storeOpts, storeExtras); const progress = createStoreProgress({ filePath: file, chunkSize: cas.chunkSize, quiet }); progress.attach(observer); let manifest; @@ -275,12 +336,24 @@ program .option('--oid ', 'Direct tree OID') .option('--key-file ', 'Path to 32-byte raw encryption key file') .option('--vault-passphrase ', 'Vault-level passphrase for decryption (prefer GIT_CAS_PASSPHRASE env var)') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') + .option('--concurrency ', 'Parallel chunk I/O operations', parseIntFlag) + .option('--max-restore-buffer ', 'Max bytes for buffered encrypted/compressed restore', parseIntFlag) .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {Record} */ opts) => { validateRestoreFlags(opts); const quiet = program.opts().quiet || program.opts().json; const observer = new EventEmitterObserver(); - const cas = createCas(opts.cwd, { observability: observer }); + + const config = loadConfig(opts.cwd); + /** @type {Record} */ + const casConfig = {}; + const concurrency = opts.concurrency ?? config.concurrency; + const maxRestoreBufferSize = opts.maxRestoreBuffer ?? config.maxRestoreBufferSize; + if (concurrency !== undefined) { casConfig.concurrency = concurrency; } + if (maxRestoreBufferSize !== undefined) { casConfig.maxRestoreBufferSize = maxRestoreBufferSize; } + + const cas = createCas(opts.cwd, { observability: observer, ...casConfig }); const treeOid = opts.oid || await cas.resolveVaultEntry({ slug: opts.slug }); const manifest = await cas.readManifest({ treeOid }); @@ -345,13 +418,14 @@ vault .command('init') .description('Initialize the vault') .option('--vault-passphrase ', 'Passphrase for vault-level encryption (prefer GIT_CAS_PASSPHRASE env var)') - .option('--algorithm ', 'KDF algorithm (pbkdf2 or scrypt)', 'pbkdf2') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') + .addOption(new Option('--algorithm ', 'KDF algorithm').choices(['pbkdf2', 'scrypt']).default('pbkdf2')) .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {Record} */ opts) => { const cas = createCas(opts.cwd); /** @type {{ passphrase?: string, kdfOptions?: { algorithm: 'pbkdf2' | 'scrypt' } }} */ const initOpts = {}; - const passphrase = resolvePassphrase(opts); + const passphrase = await resolvePassphrase(opts, { confirm: true }); if (passphrase) { initOpts.passphrase = passphrase; initOpts.kdfOptions = { algorithm: /** @type {'pbkdf2' | 'scrypt'} */ (opts.algorithm) }; @@ -478,19 +552,43 @@ vault // --------------------------------------------------------------------------- // vault rotate // --------------------------------------------------------------------------- +/** + * Resolve old and new passphrases for vault rotate from flags/files. + * + * @param {Record} opts + * @returns {Promise<{ oldPassphrase: string, newPassphrase: string }>} + */ +async function resolveRotatePassphrases(opts) { + if (opts.oldPassphraseFile === '-' && opts.newPassphraseFile === '-') { + throw new Error('Cannot read both old and new passphrase from stdin'); + } + const oldPassphrase = opts.oldPassphraseFile + ? await readPassphraseFile(opts.oldPassphraseFile) + : opts.oldPassphrase; + const newPassphrase = opts.newPassphraseFile + ? await readPassphraseFile(opts.newPassphraseFile) + : opts.newPassphrase; + if (!oldPassphrase) { throw new Error('Old passphrase required (--old-passphrase or --old-passphrase-file)'); } + if (!newPassphrase) { throw new Error('New passphrase required (--new-passphrase or --new-passphrase-file)'); } + return { oldPassphrase, newPassphrase }; +} + vault .command('rotate') .description('Rotate vault-level encryption passphrase') - .requiredOption('--old-passphrase ', 'Current vault passphrase') - .requiredOption('--new-passphrase ', 'New vault passphrase') - .option('--algorithm ', 'KDF algorithm (pbkdf2 or scrypt)') + .option('--old-passphrase ', 'Current vault passphrase') + .option('--new-passphrase ', 'New vault passphrase') + .option('--old-passphrase-file ', 'Read old passphrase from file (- for stdin)') + .option('--new-passphrase-file ', 'Read new passphrase from file (- for stdin)') + .addOption(new Option('--algorithm ', 'KDF algorithm').choices(['pbkdf2', 'scrypt'])) .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {Record} */ opts) => { + const { oldPassphrase, newPassphrase } = await resolveRotatePassphrases(opts); const cas = createCas(opts.cwd); /** @type {{ oldPassphrase: string, newPassphrase: string, kdfOptions?: { algorithm: 'pbkdf2' | 'scrypt' } }} */ const rotateOpts = { - oldPassphrase: opts.oldPassphrase, - newPassphrase: opts.newPassphrase, + oldPassphrase, + newPassphrase, }; if (opts.algorithm) { rotateOpts.kdfOptions = { algorithm: /** @type {'pbkdf2' | 'scrypt'} */ (opts.algorithm) }; diff --git a/bin/ui/passphrase-prompt.js b/bin/ui/passphrase-prompt.js new file mode 100644 index 0000000..b04a128 --- /dev/null +++ b/bin/ui/passphrase-prompt.js @@ -0,0 +1,100 @@ +import { createInterface } from 'node:readline'; +import { readFile, stat } from 'node:fs/promises'; + +/** + * Prompts for a passphrase on stderr with echo disabled. + * + * @param {Object} [options] + * @param {boolean} [options.confirm=false] - Require confirmation (ask twice). + * @returns {Promise} + */ +export async function promptPassphrase({ confirm = false } = {}) { + if (!process.stdin.isTTY) { + throw new Error( + 'Cannot prompt for passphrase: stdin is not a TTY. ' + + 'Use --vault-passphrase-file or GIT_CAS_PASSPHRASE.', + ); + } + const pass = await readHidden('Passphrase: '); + if (!pass) { + throw new Error('Passphrase must not be empty'); + } + if (confirm) { + const pass2 = await readHidden('Confirm passphrase: '); + if (pass !== pass2) { + throw new Error('Passphrases do not match'); + } + } + return pass; +} + +/** + * Warns to stderr if the file at `filePath` is group- or world-readable. + * + * @param {string} filePath + */ +async function warnInsecurePermissions(filePath) { + try { + const st = await stat(filePath); + if (st.mode & 0o077) { + process.stderr.write( + `warning: ${filePath} has insecure permissions — consider chmod 600\n`, + ); + } + } catch { + // stat may fail on non-Unix or non-existent paths; silently skip. + } +} + +/** + * Reads a passphrase from a file path, or from stdin when path is '-'. + * + * @param {string} filePath - File path, or '-' for stdin. + * @returns {Promise} + */ +export async function readPassphraseFile(filePath) { + if (filePath === '-') { + const chunks = []; + for await (const chunk of process.stdin) { + chunks.push(chunk); + } + const stdinResult = Buffer.concat(chunks).toString('utf8').replace(/\r?\n$/, ''); + if (!stdinResult) { throw new Error('Passphrase must not be empty'); } + return stdinResult; + } + await warnInsecurePermissions(filePath); + const content = await readFile(filePath, 'utf8'); + const trimmed = content.replace(/\r?\n$/, ''); + if (!trimmed) { throw new Error('Passphrase must not be empty'); } + return trimmed; +} + +/** + * Reads a line with echo disabled. + * + * Uses Node.js private API `rl._writeToOutput` to suppress echo — + * this is an intentional access to an undocumented API for password + * input, as there is no public readline API for hidden input. + * + * @param {string} prompt - Prompt text. + * @returns {Promise} + */ +function readHidden(prompt) { + return new Promise((resolve, reject) => { + const rl = createInterface({ + input: process.stdin, + output: process.stderr, + terminal: true, + }); + process.stderr.write(prompt); + rl.on('error', reject); + rl.on('close', () => reject(new Error('readline closed without input'))); + rl.question('', (answer) => { + rl.removeAllListeners('close'); + rl.close(); + process.stderr.write('\n'); + resolve(answer); + }); + rl._writeToOutput = () => {}; + }); +} diff --git a/index.d.ts b/index.d.ts index c59de13..3e685b5 100644 --- a/index.d.ts +++ b/index.d.ts @@ -171,6 +171,8 @@ export interface ContentAddressableStoreOptions { concurrency?: number; chunking?: ChunkingConfig; chunker?: ChunkingPort; + /** Maximum bytes to buffer during encrypted/compressed restore. @default 536870912 (512 MiB) */ + maxRestoreBufferSize?: number; } /** A single vault entry. */ @@ -182,6 +184,8 @@ export interface VaultEntry { /** Vault metadata stored in .vault.json. */ export interface VaultMetadata { version: number; + /** Number of encrypted store operations performed with this vault key. */ + encryptionCount?: number; encryption?: { cipher: string; kdf: { @@ -213,6 +217,7 @@ export declare class VaultService { persistence: GitPersistencePortBase; ref: GitRefPortBase; crypto: CryptoPortBase; + observability?: ObservabilityPort; }); /** Validates a vault slug. Throws CasError with code INVALID_SLUG on failure. */ @@ -341,10 +346,20 @@ export default class ContentAddressableStore { readManifest(options: { treeOid: string }): Promise; + inspectAsset(options: { + treeOid: string; + }): Promise<{ slug: string; chunksOrphaned: number }>; + + /** @deprecated Use {@link inspectAsset} instead. */ deleteAsset(options: { treeOid: string; }): Promise<{ slug: string; chunksOrphaned: number }>; + collectReferencedChunks(options: { + treeOids: string[]; + }): Promise<{ referenced: Set; total: number }>; + + /** @deprecated Use {@link collectReferencedChunks} instead. */ findOrphanedChunks(options: { treeOids: string[]; }): Promise<{ referenced: Set; total: number }>; diff --git a/index.js b/index.js index 1a643fb..85f9154 100644 --- a/index.js +++ b/index.js @@ -64,14 +64,15 @@ export default class ContentAddressableStore { * @param {number} [options.concurrency=1] - Maximum parallel chunk I/O operations. * @param {{ strategy: string, chunkSize?: number, targetChunkSize?: number, minChunkSize?: number, maxChunkSize?: number }} [options.chunking] - Chunking strategy config. * @param {import('./src/ports/ChunkingPort.js').default} [options.chunker] - Pre-built ChunkingPort instance (advanced). + * @param {number} [options.maxRestoreBufferSize=536870912] - Max buffered restore size in bytes for encrypted/compressed restores (default 512 MiB). */ - constructor({ plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker }) { - this.#config = { plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker }; + constructor({ plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker, maxRestoreBufferSize }) { + this.#config = { plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker, maxRestoreBufferSize }; this.service = null; this.#servicePromise = null; } - /** @type {{ plumbing: *, chunkSize?: number, codec?: *, policy?: *, crypto?: *, observability?: *, merkleThreshold?: number, concurrency?: number, chunking?: *, chunker?: * }} */ + /** @type {{ plumbing: *, chunkSize?: number, codec?: *, policy?: *, crypto?: *, observability?: *, merkleThreshold?: number, concurrency?: number, chunking?: *, chunker?: *, maxRestoreBufferSize?: number }} */ #config; /** @type {VaultService|null} */ #vault = null; @@ -111,13 +112,14 @@ export default class ContentAddressableStore { merkleThreshold: cfg.merkleThreshold, concurrency: cfg.concurrency, chunker, + maxRestoreBufferSize: cfg.maxRestoreBufferSize, }); const ref = new GitRefAdapter({ plumbing: cfg.plumbing, policy: cfg.policy, }); - this.#vault = new VaultService({ persistence, ref, crypto }); + this.#vault = new VaultService({ persistence, ref, crypto, observability: this.service.observability }); return this.service; } @@ -314,7 +316,18 @@ export default class ContentAddressableStore { } /** - * Returns deletion metadata for an asset stored in a Git tree. + * Reads a manifest from a Git tree and returns inspection metadata. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID of the asset. + * @returns {Promise<{ slug: string, chunksOrphaned: number }>} + */ + async inspectAsset(options) { + const service = await this.#getService(); + return await service.inspectAsset(options); + } + + /** + * @deprecated Use {@link inspectAsset} instead. * @param {Object} options * @param {string} options.treeOid - Git tree OID of the asset. * @returns {Promise<{ slug: string, chunksOrphaned: number }>} @@ -330,6 +343,17 @@ export default class ContentAddressableStore { * @param {string[]} options.treeOids - Git tree OIDs to analyze. * @returns {Promise<{ referenced: Set, total: number }>} */ + async collectReferencedChunks(options) { + const service = await this.#getService(); + return await service.collectReferencedChunks(options); + } + + /** + * @deprecated Use {@link collectReferencedChunks} instead. + * @param {Object} options + * @param {string[]} options.treeOids - Git tree OIDs to analyze. + * @returns {Promise<{ referenced: Set, total: number }>} + */ async findOrphanedChunks(options) { const service = await this.#getService(); return await service.findOrphanedChunks(options); diff --git a/scripts/hooks/pre-commit b/scripts/hooks/pre-commit new file mode 100755 index 0000000..d5e25a7 --- /dev/null +++ b/scripts/hooks/pre-commit @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +# pre-commit git hook +# Lint must pass cleanly. Zero errors, zero warnings. + +set -e + +echo "Running pre-commit lint gate..." + +echo "→ Linting..." +pnpm run lint + +echo "✅ Lint passed." diff --git a/scripts/git-hooks/pre-push b/scripts/hooks/pre-push similarity index 100% rename from scripts/git-hooks/pre-push rename to scripts/hooks/pre-push diff --git a/scripts/install-hooks.sh b/scripts/install-hooks.sh index fe569e9..567f8d9 100644 --- a/scripts/install-hooks.sh +++ b/scripts/install-hooks.sh @@ -6,7 +6,7 @@ set -e SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -HOOKS_DIR="${SCRIPT_DIR}/git-hooks" +HOOKS_DIR="${SCRIPT_DIR}/hooks" # Make all hooks executable chmod +x "${HOOKS_DIR}"/* diff --git a/src/domain/errors/CasError.js b/src/domain/errors/CasError.js index 6acc1da..54f9ba3 100644 --- a/src/domain/errors/CasError.js +++ b/src/domain/errors/CasError.js @@ -15,6 +15,8 @@ export default class CasError extends Error { this.name = this.constructor.name; this.code = code; this.meta = meta; - Error.captureStackTrace(this, this.constructor); + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + } } } diff --git a/src/domain/services/CasService.d.ts b/src/domain/services/CasService.d.ts index 358579b..069e82a 100644 --- a/src/domain/services/CasService.d.ts +++ b/src/domain/services/CasService.d.ts @@ -46,6 +46,13 @@ export interface ObservabilityPort { span(name: string): { end(meta?: Record): void }; } +/** Port interface for chunking strategies (fixed, CDC, etc.). */ +export interface ChunkingPort { + chunk(source: AsyncIterable): AsyncIterable; + readonly strategy: string; + readonly params: Record; +} + /** Constructor options for {@link CasService}. */ export interface CasServiceOptions { persistence: GitPersistencePort; @@ -55,6 +62,8 @@ export interface CasServiceOptions { chunkSize?: number; merkleThreshold?: number; concurrency?: number; + chunker?: ChunkingPort; + maxRestoreBufferSize?: number; } /** Options for key derivation. */ @@ -90,6 +99,7 @@ export default class CasService { readonly chunkSize: number; readonly merkleThreshold: number; readonly concurrency: number; + readonly maxRestoreBufferSize: number; constructor(options: CasServiceOptions); @@ -131,10 +141,20 @@ export default class CasService { readManifest(options: { treeOid: string }): Promise; + inspectAsset(options: { + treeOid: string; + }): Promise<{ slug: string; chunksOrphaned: number }>; + + /** @deprecated Use {@link inspectAsset} instead. */ deleteAsset(options: { treeOid: string; }): Promise<{ slug: string; chunksOrphaned: number }>; + collectReferencedChunks(options: { + treeOids: string[]; + }): Promise<{ referenced: Set; total: number }>; + + /** @deprecated Use {@link collectReferencedChunks} instead. */ findOrphanedChunks(options: { treeOids: string[]; }): Promise<{ referenced: Set; total: number }>; diff --git a/src/domain/services/CasService.js b/src/domain/services/CasService.js index 9d1370c..851fd75 100644 --- a/src/domain/services/CasService.js +++ b/src/domain/services/CasService.js @@ -34,28 +34,48 @@ export default class CasService { * @param {number} [options.merkleThreshold=1000] - Chunk count threshold for Merkle manifests. * @param {number} [options.concurrency=1] - Maximum parallel chunk I/O operations. * @param {import('../../ports/ChunkingPort.js').default} [options.chunker] - Chunking strategy (default FixedChunker). + * @param {number} [options.maxRestoreBufferSize=536870912] - Max bytes for buffered restore (default 512 MiB). */ - constructor({ persistence, codec, crypto, observability, chunkSize = 256 * 1024, merkleThreshold = 1000, concurrency = 1, chunker }) { + constructor({ persistence, codec, crypto, observability, chunkSize = 256 * 1024, merkleThreshold = 1000, concurrency = 1, chunker, maxRestoreBufferSize = 512 * 1024 * 1024 }) { CasService._validateObservability(observability); - if (chunkSize < 1024) { - throw new Error('Chunk size must be at least 1024 bytes'); - } + CasService.#validateConstructorArgs({ chunkSize, merkleThreshold, concurrency, maxRestoreBufferSize }); this.persistence = persistence; this.codec = codec; this.crypto = crypto; this.observability = observability; this.chunkSize = chunkSize; + if (chunkSize > 10 * 1024 * 1024) { + observability.log('warn', `Chunk size ${chunkSize} exceeds 10 MiB — consider a smaller value`, { chunkSize }); + } /** @type {import('../../ports/ChunkingPort.js').default} */ this.chunker = chunker || new FixedChunker({ chunkSize }); + this.merkleThreshold = merkleThreshold; + this.concurrency = concurrency; + this.maxRestoreBufferSize = maxRestoreBufferSize; + this.#keyResolver = new KeyResolver(crypto); + } + + /** + * Validates constructor numeric arguments. + * @private + */ + static #validateConstructorArgs({ chunkSize, merkleThreshold, concurrency, maxRestoreBufferSize }) { + if (!Number.isInteger(chunkSize) || chunkSize < 1024) { + throw new Error('Chunk size must be an integer >= 1024 bytes'); + } + const MAX_CHUNK_SIZE = 100 * 1024 * 1024; + if (chunkSize > MAX_CHUNK_SIZE) { + throw new Error(`Chunk size must not exceed ${MAX_CHUNK_SIZE} bytes (100 MiB)`); + } if (!Number.isInteger(merkleThreshold) || merkleThreshold < 1) { throw new Error('Merkle threshold must be a positive integer'); } - this.merkleThreshold = merkleThreshold; if (!Number.isInteger(concurrency) || concurrency < 1) { throw new Error('Concurrency must be a positive integer'); } - this.concurrency = concurrency; - this.#keyResolver = new KeyResolver(crypto); + if (!Number.isInteger(maxRestoreBufferSize) || maxRestoreBufferSize < 1024) { + throw new Error('maxRestoreBufferSize must be a positive integer >= 1024'); + } } /** @@ -127,15 +147,23 @@ export default class CasService { launchWrite(chunk, nextIndex++); } } catch (err) { - await Promise.allSettled(pending); - if (err instanceof CasError) { throw err; } + const settled = await Promise.allSettled(pending); + const orphanedBlobs = settled + .filter((r) => r.status === 'fulfilled') + .map((r) => r.value.blob); + if (err instanceof CasError) { + err.meta = { ...err.meta, orphanedBlobs }; + throw err; + } const casErr = new CasError( `Stream error during store: ${err.message}`, 'STREAM_ERROR', - { chunksDispatched: nextIndex, originalError: err }, + { chunksDispatched: nextIndex, orphanedBlobs, originalError: err }, ); - await Promise.allSettled(pending); - this.observability.metric('error', { code: casErr.code, message: casErr.message }); + this.observability.metric('error', { + code: casErr.code, message: casErr.message, + orphanedBlobs: orphanedBlobs.length, + }); throw casErr; } @@ -257,6 +285,13 @@ export default class CasService { const manifestData = this._buildManifestData(slug, filename, compression); const processedSource = compression ? this._compressStream(source) : source; + if (keyInfo.key && this.chunker.strategy === 'cdc') { + this.observability.log( + 'warn', + 'CDC deduplication is ineffective with encryption — ciphertext is pseudorandom', + { strategy: 'cdc' }, + ); + } if (keyInfo.key) { const { encrypt, finalize } = this.crypto.createEncryptionStream(keyInfo.key); await this._chunkAndStore(encrypt(processedSource), manifestData); @@ -469,14 +504,38 @@ export default class CasService { * @private */ async *_restoreBuffered(manifest, key) { + const totalSize = manifest.chunks.reduce((acc, c) => acc + c.size, 0); + if (totalSize > this.maxRestoreBufferSize) { + throw new CasError( + `Encrypted/compressed restore would buffer ${totalSize} bytes ` + + `(limit: ${this.maxRestoreBufferSize}). Increase maxRestoreBufferSize ` + + 'or store without encryption.', + 'RESTORE_TOO_LARGE', + { size: totalSize, limit: this.maxRestoreBufferSize }, + ); + } let buffer = Buffer.concat(await this._readAndVerifyChunks(manifest.chunks)); if (manifest.encryption?.encrypted) { - buffer = await this.decrypt({ buffer, key, meta: manifest.encryption }); + try { + buffer = await this.decrypt({ buffer, key, meta: manifest.encryption }); + } catch (err) { + if (err instanceof CasError && err.code === 'INTEGRITY_ERROR') { + this.observability.metric('error', { action: 'decryption_failed', slug: manifest.slug }); + } + throw err; + } } if (manifest.compression) { buffer = await this._decompress(buffer); + if (buffer.length > this.maxRestoreBufferSize) { + throw new CasError( + `Decompressed restore is ${buffer.length} bytes (limit: ${this.maxRestoreBufferSize})`, + 'RESTORE_TOO_LARGE', + { size: buffer.length, limit: this.maxRestoreBufferSize }, + ); + } } this.observability.metric('file', { @@ -626,7 +685,7 @@ export default class CasService { } /** - * Returns deletion metadata for an asset stored in a Git tree. + * Reads a manifest from a Git tree and returns inspection metadata. * Does not perform any destructive Git operations. * * @param {Object} options @@ -634,7 +693,7 @@ export default class CasService { * @returns {Promise<{ chunksOrphaned: number, slug: string }>} * @throws {CasError} MANIFEST_NOT_FOUND if the tree has no manifest */ - async deleteAsset({ treeOid }) { + async inspectAsset({ treeOid }) { const manifest = await this.readManifest({ treeOid }); return { slug: manifest.slug, @@ -642,6 +701,17 @@ export default class CasService { }; } + /** + * @deprecated Use {@link inspectAsset} instead. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID of the asset + * @returns {Promise<{ chunksOrphaned: number, slug: string }>} + */ + async deleteAsset(options) { + this.observability.log('warn', 'deleteAsset() is deprecated — use inspectAsset()'); + return await this.inspectAsset(options); + } + /** * Aggregates referenced chunk blob OIDs across multiple stored assets. * Analysis only — does not delete or modify anything. @@ -651,7 +721,7 @@ export default class CasService { * @returns {Promise<{ referenced: Set, total: number }>} * @throws {CasError} MANIFEST_NOT_FOUND if any treeOid lacks a manifest */ - async findOrphanedChunks({ treeOids }) { + async collectReferencedChunks({ treeOids }) { const referenced = new Set(); let total = 0; @@ -666,6 +736,17 @@ export default class CasService { return { referenced, total }; } + /** + * @deprecated Use {@link collectReferencedChunks} instead. + * @param {Object} options + * @param {string[]} options.treeOids - Git tree OIDs to analyze + * @returns {Promise<{ referenced: Set, total: number }>} + */ + async findOrphanedChunks(options) { + this.observability.log('warn', 'findOrphanedChunks() is deprecated — use collectReferencedChunks()'); + return await this.collectReferencedChunks(options); + } + /** * Derives an encryption key from a passphrase using PBKDF2 or scrypt. * @param {Object} options diff --git a/src/domain/services/VaultService.js b/src/domain/services/VaultService.js index d5a1ac2..c793a39 100644 --- a/src/domain/services/VaultService.js +++ b/src/domain/services/VaultService.js @@ -80,16 +80,22 @@ function hasControlChars(str) { export default class VaultService { static VAULT_REF = VAULT_REF; + /** @type {number} Nonce usage warning threshold (2^31). */ + static ENCRYPTION_COUNT_WARN = 2 ** 31; + /** * @param {Object} options * @param {import('../../ports/GitPersistencePort.js').default} options.persistence * @param {import('../../ports/GitRefPort.js').default} options.ref * @param {import('../../ports/CryptoPort.js').default} options.crypto + * @param {import('../../ports/ObservabilityPort.js').default} [options.observability] */ - constructor({ persistence, ref, crypto }) { + constructor({ persistence, ref, crypto, observability }) { this.persistence = persistence; this.ref = ref; this.crypto = crypto; + /** @type {import('../../ports/ObservabilityPort.js').default} */ + this.observability = observability || { metric() {}, log() {}, span: () => ({ end() {} }) }; } // --------------------------------------------------------------------------- @@ -389,9 +395,24 @@ export default class VaultService { } const isUpdate = state.entries.has(slug); state.entries.set(slug, treeOid); + // Shallow copy to avoid mutating readState()'s object on CAS retries. + const metadata = { ...(state.metadata || { version: 1 }) }; + if (metadata.encryption) { + // Tracks nonce-relevant operations: every addToVault on an encrypted + // vault implies an encryption occurred at the store layer. + metadata.encryptionCount = (metadata.encryptionCount || 0) + 1; + if (metadata.encryptionCount >= VaultService.ENCRYPTION_COUNT_WARN) { + this.observability.log( + 'warn', + `Vault encryption count (${metadata.encryptionCount}) exceeds ` + + `${VaultService.ENCRYPTION_COUNT_WARN} — rotate your key`, + { encryptionCount: metadata.encryptionCount }, + ); + } + } return { entries: state.entries, - metadata: state.metadata || { version: 1 }, + metadata, message: isUpdate ? `vault: update ${slug}` : `vault: add ${slug}`, }; }); diff --git a/src/infrastructure/adapters/NodeCryptoAdapter.js b/src/infrastructure/adapters/NodeCryptoAdapter.js index f89898c..a317a11 100644 --- a/src/infrastructure/adapters/NodeCryptoAdapter.js +++ b/src/infrastructure/adapters/NodeCryptoAdapter.js @@ -1,6 +1,7 @@ import { createHash, createCipheriv, createDecipheriv, randomBytes, pbkdf2, scrypt } from 'node:crypto'; import { promisify } from 'node:util'; import CryptoPort from '../../ports/CryptoPort.js'; +import CasError from '../../domain/errors/CasError.js'; /** * Node.js implementation of CryptoPort using node:crypto. @@ -28,9 +29,9 @@ export default class NodeCryptoAdapter extends CryptoPort { * @override * @param {Buffer|Uint8Array} buffer - Plaintext to encrypt. * @param {Buffer|Uint8Array} key - 32-byte encryption key. - * @returns {{ buf: Buffer, meta: import('../../ports/CryptoPort.js').EncryptionMeta }} + * @returns {Promise<{ buf: Buffer, meta: import('../../ports/CryptoPort.js').EncryptionMeta }>} */ - encryptBuffer(buffer, key) { + async encryptBuffer(buffer, key) { this._validateKey(key); const nonce = randomBytes(12); const cipher = createCipheriv('aes-256-gcm', key, nonce); @@ -50,6 +51,7 @@ export default class NodeCryptoAdapter extends CryptoPort { * @returns {Buffer} */ decryptBuffer(buffer, key, meta) { + this._validateKey(key); const nonce = Buffer.from(meta.nonce, 'base64'); const tag = Buffer.from(meta.tag, 'base64'); const decipher = createDecipheriv('aes-256-gcm', key, nonce); @@ -66,6 +68,7 @@ export default class NodeCryptoAdapter extends CryptoPort { this._validateKey(key); const nonce = randomBytes(12); const cipher = createCipheriv('aes-256-gcm', key, nonce); + let streamFinalized = false; /** @param {AsyncIterable} source */ const encrypt = async function* (source) { @@ -79,9 +82,16 @@ export default class NodeCryptoAdapter extends CryptoPort { if (final.length > 0) { yield final; } + streamFinalized = true; }; const finalize = () => { + if (!streamFinalized) { + throw new CasError( + 'Cannot finalize before the encrypt stream is fully consumed', + 'STREAM_NOT_CONSUMED', + ); + } const tag = cipher.getAuthTag(); return this._buildMeta(nonce.toString('base64'), tag.toString('base64')); }; diff --git a/src/infrastructure/adapters/WebCryptoAdapter.js b/src/infrastructure/adapters/WebCryptoAdapter.js index 5a70733..1032934 100644 --- a/src/infrastructure/adapters/WebCryptoAdapter.js +++ b/src/infrastructure/adapters/WebCryptoAdapter.js @@ -9,6 +9,21 @@ import CasError from '../../domain/errors/CasError.js'; * AES-GCM is a one-shot API (the GCM tag is computed over the entire plaintext). */ export default class WebCryptoAdapter extends CryptoPort { + /** @type {number} */ + #maxEncryptionBufferSize; + + /** + * @param {Object} [options] + * @param {number} [options.maxEncryptionBufferSize=536870912] - Max bytes to buffer during streaming encryption (default 512 MiB). + */ + constructor({ maxEncryptionBufferSize = 512 * 1024 * 1024 } = {}) { + super(); + if (!Number.isFinite(maxEncryptionBufferSize) || maxEncryptionBufferSize <= 0) { + throw new RangeError('maxEncryptionBufferSize must be a finite positive number'); + } + this.#maxEncryptionBufferSize = maxEncryptionBufferSize; + } + /** * @override * @param {Buffer|Uint8Array} buf - Data to hash. @@ -73,6 +88,7 @@ export default class WebCryptoAdapter extends CryptoPort { * @returns {Promise} */ async decryptBuffer(buffer, key, meta) { + this._validateKey(key); const nonce = this.#fromBase64(meta.nonce); const tag = this.#fromBase64(meta.tag); const cryptoKey = await this.#importKey(key); @@ -104,49 +120,61 @@ export default class WebCryptoAdapter extends CryptoPort { this._validateKey(key); const nonce = this.randomBytes(12); const cryptoKeyPromise = this.#importKey(key); + const maxBuf = this.#maxEncryptionBufferSize; + const state = { /** @type {Uint8Array|null} */ tag: null, consumed: false }; - // Web Crypto buffers all data for the one-shot AES-GCM call (GCM tag spans the whole plaintext). - /** @type {Buffer[]} */ - const chunks = []; - /** @type {Uint8Array|null} */ - let finalTag = null; - let streamConsumed = false; + const encrypt = WebCryptoAdapter.#makeEncryptGenerator({ cryptoKeyPromise, nonce, maxBuf, state }); + + const finalize = () => { + if (!state.consumed) { + throw new CasError('Cannot finalize before the encrypt stream is fully consumed', 'STREAM_NOT_CONSUMED'); + } + return this._buildMeta(this.#toBase64(nonce), this.#toBase64(/** @type {Uint8Array} */ (state.tag))); + }; + + return { encrypt, finalize }; + } - /** @param {AsyncIterable} source */ - const encrypt = async function* (source) { + /** + * Builds the encrypt async generator for createEncryptionStream. + * + * A static method is used (rather than closures) because `async function*` + * cannot be an arrow function — `this` binding would be lost. The `state` + * object bridges mutable data between the generator and `finalize()`. + * + * @param {{ cryptoKeyPromise: Promise, nonce: Buffer|Uint8Array, maxBuf: number, state: { tag: Uint8Array|null, consumed: boolean } }} ctx + * @returns {(source: AsyncIterable) => AsyncGenerator} + */ + static #makeEncryptGenerator({ cryptoKeyPromise, nonce, maxBuf, state }) { + return async function* (source) { + /** @type {Buffer[]} */ + const chunks = []; + let accumulatedBytes = 0; for await (const chunk of source) { + accumulatedBytes += chunk.length; + if (accumulatedBytes > maxBuf) { + throw new CasError( + `Streaming encryption buffered ${accumulatedBytes} bytes (limit: ${maxBuf}). ` + + 'Web Crypto AES-GCM buffers all data. Use Node.js/Bun or store without encryption for large files.', + 'ENCRYPTION_BUFFER_EXCEEDED', + { accumulated: accumulatedBytes, limit: maxBuf }, + ); + } chunks.push(chunk); } - const buffer = Buffer.concat(chunks); const cryptoKey = await cryptoKeyPromise; const encrypted = await globalThis.crypto.subtle.encrypt( // @ts-ignore -- Uint8Array satisfies BufferSource at runtime { name: 'AES-GCM', iv: /** @type {Uint8Array} */ (nonce) }, - cryptoKey, - buffer + cryptoKey, buffer, ); - const fullBuffer = new Uint8Array(encrypted); const tagLength = 16; - const ciphertext = fullBuffer.slice(0, -tagLength); - finalTag = fullBuffer.slice(-tagLength); - streamConsumed = true; - - yield Buffer.from(ciphertext); + state.tag = fullBuffer.slice(-tagLength); + state.consumed = true; + yield Buffer.from(fullBuffer.slice(0, -tagLength)); }; - - const finalize = () => { - if (!streamConsumed) { - throw new CasError( - 'Cannot finalize before the encrypt stream is fully consumed', - 'STREAM_NOT_CONSUMED', - ); - } - return this._buildMeta(this.#toBase64(nonce), this.#toBase64(/** @type {Uint8Array} */ (finalTag))); - }; - - return { encrypt, finalize }; } /** diff --git a/src/infrastructure/chunkers/CdcChunker.js b/src/infrastructure/chunkers/CdcChunker.js index 0eaac3d..536f65c 100644 --- a/src/infrastructure/chunkers/CdcChunker.js +++ b/src/infrastructure/chunkers/CdcChunker.js @@ -277,6 +277,11 @@ export default class CdcChunker extends ChunkingPort { `targetChunkSize (${targetChunkSize}) must be in [${minChunkSize}, ${maxChunkSize}]`, ); } + if (maxChunkSize > 100 * 1024 * 1024) { + throw new RangeError( + `maxChunkSize must not exceed 104857600 bytes (100 MiB), got ${maxChunkSize}`, + ); + } this.#minChunkSize = minChunkSize; this.#maxChunkSize = maxChunkSize; diff --git a/src/infrastructure/chunkers/FixedChunker.js b/src/infrastructure/chunkers/FixedChunker.js index 1477e18..4444823 100644 --- a/src/infrastructure/chunkers/FixedChunker.js +++ b/src/infrastructure/chunkers/FixedChunker.js @@ -17,6 +17,14 @@ export default class FixedChunker extends ChunkingPort { */ constructor({ chunkSize = 262144 } = {}) { super(); + if (!Number.isInteger(chunkSize) || chunkSize < 1) { + throw new RangeError(`chunkSize must be a positive integer, got ${chunkSize}`); + } + if (chunkSize > 100 * 1024 * 1024) { + throw new RangeError( + `Chunk size must not exceed 104857600 bytes (100 MiB), got ${chunkSize}`, + ); + } this.#chunkSize = chunkSize; } @@ -36,18 +44,26 @@ export default class FixedChunker extends ChunkingPort { * @yields {Buffer} */ async *chunk(source) { - let buffer = Buffer.alloc(0); + const cs = this.#chunkSize; + const buf = Buffer.allocUnsafe(cs); + let offset = 0; for await (const data of source) { - buffer = Buffer.concat([buffer, data]); - while (buffer.length >= this.#chunkSize) { - yield buffer.slice(0, this.#chunkSize); - buffer = buffer.slice(this.#chunkSize); + let srcPos = 0; + while (srcPos < data.length) { + const n = Math.min(cs - offset, data.length - srcPos); + data.copy(buf, offset, srcPos, srcPos + n); + offset += n; + srcPos += n; + if (offset === cs) { + yield Buffer.from(buf); + offset = 0; + } } } - if (buffer.length > 0) { - yield buffer; + if (offset > 0) { + yield Buffer.from(buf.subarray(0, offset)); } } } diff --git a/test/CONVENTIONS.md b/test/CONVENTIONS.md new file mode 100644 index 0000000..f723294 --- /dev/null +++ b/test/CONVENTIONS.md @@ -0,0 +1,57 @@ +# Test Conventions + +Rules for writing deterministic, cross-runtime tests. All tests must pass +on Node.js, Bun, and Deno. + +## Time and Scheduling + +**Never assert wall-clock timing.** `Date.now()` deltas are +nondeterministic — they flake under CI load and vary across runtimes. + +**Inject delay/timer dependencies.** If production code uses `setTimeout` +or similar scheduling, accept the delay function as a parameter: + +```js +// production: injectable dependency with a real default +export function runAction(fn, getJson, { delay = defaultDelay } = {}) { ... } + +// test: inject a spy — no global patching needed +const delaySpy = vi.fn().mockResolvedValue(undefined); +const action = runAction(fn, getJson, { delay: delaySpy }); +await action(); +expect(delaySpy).toHaveBeenCalledWith(1000); +``` + +**Avoid `vi.useFakeTimers()`.** Vitest fake timers rely on +`@sinonjs/fake-timers`, which patches globals differently across runtimes. +Prefer dependency injection over global monkey-patching. + +## File Permissions + +**Use `chmod()` after `writeFile()`, not `writeFile({ mode })`.** The +`mode` parameter is filtered through `process.umask()`. A restrictive +umask (e.g., `0o077`) silently strips the bits you requested, making +permission-sensitive tests environment-dependent. + +```js +// wrong — umask can mask the requested mode +await writeFile(path, 'data', { mode: 0o644 }); + +// correct — chmod sets the exact mode regardless of umask +await writeFile(path, 'data'); +await chmod(path, 0o644); +``` + +This applies to macOS and Linux (our supported platforms). Permission +bits are a Unix concept — `chmod` is a no-op on Windows. + +## General Principles + +- **Test behavior, not timing.** Assert that a function was called, not + how long it took. +- **Inject infrastructure.** Clocks, filesystems, network — anything that + varies across environments should be injectable through constructor + parameters or function arguments. +- **No global state patching when injection is available.** If you control + the code under test, add a parameter. Only patch globals for third-party + code you cannot modify. diff --git a/test/unit/cli/actions.test.js b/test/unit/cli/actions.test.js index 3d4fd3d..6bd4ab4 100644 --- a/test/unit/cli/actions.test.js +++ b/test/unit/cli/actions.test.js @@ -109,6 +109,48 @@ describe('runAction', () => { }); }); +describe('runAction — INTEGRITY_ERROR rate-limiting', () => { + let stderrSpy; + const originalExitCode = process.exitCode; + + beforeEach(() => { + process.exitCode = undefined; + stderrSpy = vi.spyOn(process.stderr, 'write').mockImplementation(() => true); + }); + + afterEach(() => { + process.exitCode = originalExitCode; + stderrSpy.mockRestore(); + }); + + it('awaits delay(1000) before writing INTEGRITY_ERROR output', async () => { + let releaseDelay = () => {}; + const delaySpy = vi.fn().mockImplementation(() => new Promise((resolve) => { + releaseDelay = resolve; + })); + const err = Object.assign(new Error('bad key'), { code: 'INTEGRITY_ERROR' }); + const actionPromise = runAction(() => { throw err; }, () => false, { delay: delaySpy })(); + + expect(delaySpy).toHaveBeenCalledWith(1000); + expect(stderrSpy).not.toHaveBeenCalled(); + expect(process.exitCode).toBeUndefined(); + + releaseDelay(); + await actionPromise; + expect(stderrSpy).toHaveBeenCalled(); + expect(process.exitCode).toBe(1); + }); + + it('does not call delay for non-INTEGRITY_ERROR codes', async () => { + const delaySpy = vi.fn().mockResolvedValue(undefined); + const err = Object.assign(new Error('gone'), { code: 'MISSING_KEY' }); + const action = runAction(async () => { throw err; }, () => false, { delay: delaySpy }); + await action(); + expect(delaySpy).not.toHaveBeenCalled(); + expect(process.exitCode).toBe(1); + }); +}); + describe('HINTS', () => { it('contains expected error codes', () => { expect(HINTS).toHaveProperty('MISSING_KEY'); diff --git a/test/unit/cli/config.test.js b/test/unit/cli/config.test.js new file mode 100644 index 0000000..6a81879 --- /dev/null +++ b/test/unit/cli/config.test.js @@ -0,0 +1,224 @@ +import { describe, it, expect, afterEach } from 'vitest'; +import { writeFileSync, mkdirSync, rmSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; +import { loadConfig, mergeConfig } from '../../../bin/config.js'; + +const tmpDir = join(tmpdir(), `casrc-test-${Date.now()}`); + +function setup() { + mkdirSync(tmpDir, { recursive: true }); +} + +function teardown() { + rmSync(tmpDir, { recursive: true, force: true }); +} + +describe('loadConfig', () => { + afterEach(teardown); + + it('returns empty object when .casrc does not exist', () => { + setup(); + expect(loadConfig(tmpDir)).toEqual({}); + }); + + it('loads valid JSON from .casrc', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ chunkSize: 65536, strategy: 'cdc' })); + const config = loadConfig(tmpDir); + expect(config.chunkSize).toBe(65536); + expect(config.strategy).toBe('cdc'); + }); + + it('throws on invalid JSON', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), '{bad json'); + expect(() => loadConfig(tmpDir)).toThrow(/invalid JSON/); + }); + + it('throws on non-object JSON', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), '"just a string"'); + expect(() => loadConfig(tmpDir)).toThrow(/expected a JSON object/); + }); + + it('throws on array JSON', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), '[1, 2, 3]'); + expect(() => loadConfig(tmpDir)).toThrow(/expected a JSON object/); + }); +}); + +describe('loadConfig — chunkSize validation', () => { + afterEach(teardown); + + it('rejects non-integer chunkSize', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ chunkSize: 'big' })); + expect(() => loadConfig(tmpDir)).toThrow(/chunkSize must be an integer >= 1024/); + }); + + it('rejects chunkSize below 1024', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ chunkSize: 512 })); + expect(() => loadConfig(tmpDir)).toThrow(/chunkSize must be an integer >= 1024/); + }); + + it('rejects chunkSize above 100 MiB', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ chunkSize: 200 * 1024 * 1024 })); + expect(() => loadConfig(tmpDir)).toThrow(/chunkSize must not exceed/); + }); +}); + +describe('loadConfig — field validation', () => { + afterEach(teardown); + + it('rejects invalid strategy', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ strategy: 'random' })); + expect(() => loadConfig(tmpDir)).toThrow(/strategy must be "fixed" or "cdc"/); + }); + + it('rejects non-positive concurrency', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ concurrency: 0 })); + expect(() => loadConfig(tmpDir)).toThrow(/concurrency must be an integer >= 1/); + }); + + it('rejects invalid codec', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ codec: 'xml' })); + expect(() => loadConfig(tmpDir)).toThrow(/codec must be "json" or "cbor"/); + }); + + it('accepts a fully valid config', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ + chunkSize: 65536, + strategy: 'cdc', + concurrency: 4, + codec: 'cbor', + compression: 'gzip', + merkleThreshold: 500, + maxRestoreBufferSize: 1048576, + cdc: { minChunkSize: 2048, targetChunkSize: 8192, maxChunkSize: 16384 }, + })); + const config = loadConfig(tmpDir); + expect(config.chunkSize).toBe(65536); + expect(config.cdc.targetChunkSize).toBe(8192); + }); +}); + +describe('loadConfig — CDC inter-field ordering', () => { + afterEach(teardown); + + it('rejects cdc.minChunkSize > cdc.maxChunkSize', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ + cdc: { minChunkSize: 16384, targetChunkSize: 8192, maxChunkSize: 4096 }, + })); + expect(() => loadConfig(tmpDir)).toThrow(/cdc\.minChunkSize must not exceed cdc\.maxChunkSize/); + }); + + it('rejects cdc.targetChunkSize < cdc.minChunkSize', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ + cdc: { minChunkSize: 8192, targetChunkSize: 4096, maxChunkSize: 16384 }, + })); + expect(() => loadConfig(tmpDir)).toThrow(/cdc\.targetChunkSize must be >= cdc\.minChunkSize/); + }); + + it('rejects cdc.targetChunkSize > cdc.maxChunkSize', () => { + setup(); + writeFileSync(join(tmpDir, '.casrc'), JSON.stringify({ + cdc: { minChunkSize: 2048, targetChunkSize: 32768, maxChunkSize: 16384 }, + })); + expect(() => loadConfig(tmpDir)).toThrow(/cdc\.targetChunkSize must be <= cdc\.maxChunkSize/); + }); +}); + +describe('mergeConfig — CLI overrides', () => { + it('CLI flags override config', () => { + const { casConfig } = mergeConfig({ chunkSize: 4096, strategy: 'fixed' }, { chunkSize: 65536 }); + expect(casConfig.chunkSize).toBe(4096); + expect(casConfig.chunking).toEqual({ strategy: 'fixed', chunkSize: 4096 }); + }); + + it('config fills in when CLI omits flags', () => { + const { casConfig } = mergeConfig({}, { concurrency: 4, chunkSize: 32768 }); + expect(casConfig.concurrency).toBe(4); + expect(casConfig.chunkSize).toBe(32768); + }); +}); + +describe('mergeConfig — CDC strategy', () => { + it('CDC strategy merges cdc sub-config', () => { + const config = { cdc: { targetChunkSize: 8192, minChunkSize: 2048, maxChunkSize: 16384 } }; + const { casConfig } = mergeConfig({ strategy: 'cdc' }, config); + expect(casConfig.chunking).toEqual({ + strategy: 'cdc', + targetChunkSize: 8192, + minChunkSize: 2048, + maxChunkSize: 16384, + }); + }); + + it('CDC CLI params override cdc sub-config', () => { + const config = { cdc: { targetChunkSize: 8192, minChunkSize: 2048, maxChunkSize: 16384 } }; + const { casConfig } = mergeConfig({ strategy: 'cdc', targetChunkSize: 4096 }, config); + expect(casConfig.chunking.targetChunkSize).toBe(4096); + expect(casConfig.chunking.minChunkSize).toBe(2048); + }); +}); + +describe('mergeConfig — compression', () => { + it('gzip from CLI', () => { + const { storeExtras } = mergeConfig({ gzip: true }, {}); + expect(storeExtras.compression).toEqual({ algorithm: 'gzip' }); + }); + + it('gzip from config', () => { + const { storeExtras } = mergeConfig({}, { compression: 'gzip' }); + expect(storeExtras.compression).toEqual({ algorithm: 'gzip' }); + }); + + it('no compression by default', () => { + const { storeExtras } = mergeConfig({}, {}); + expect(storeExtras.compression).toBeUndefined(); + }); +}); + +describe('mergeConfig — nullish coalescing', () => { + it('empty-string CLI strategy does not fall through to config', () => { + const { casConfig } = mergeConfig({ strategy: '' }, { strategy: 'cdc' }); + expect(casConfig.chunking).toBeUndefined(); + }); + + it('empty-string CLI codec does not fall through to config', () => { + const { casConfig } = mergeConfig({ codec: '' }, { codec: 'cbor' }); + expect(casConfig.codec).toBeUndefined(); + }); +}); + +describe('mergeConfig — codec and thresholds', () => { + it('cbor codec from CLI', () => { + const { casConfig } = mergeConfig({ codec: 'cbor' }, {}); + expect(casConfig.codec).toBe('cbor'); + }); + + it('cbor codec from config', () => { + const { casConfig } = mergeConfig({}, { codec: 'cbor' }); + expect(casConfig.codec).toBe('cbor'); + }); + + it('merkleThreshold from config', () => { + const { casConfig } = mergeConfig({}, { merkleThreshold: 500 }); + expect(casConfig.merkleThreshold).toBe(500); + }); + + it('maxRestoreBufferSize from config', () => { + const { casConfig } = mergeConfig({}, { maxRestoreBufferSize: 1024 * 1024 }); + expect(casConfig.maxRestoreBufferSize).toBe(1024 * 1024); + }); +}); diff --git a/test/unit/cli/passphrase-prompt.test.js b/test/unit/cli/passphrase-prompt.test.js new file mode 100644 index 0000000..2c2866e --- /dev/null +++ b/test/unit/cli/passphrase-prompt.test.js @@ -0,0 +1,84 @@ +import { describe, it, expect, afterEach } from 'vitest'; +import { writeFile, unlink, chmod } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { readPassphraseFile } from '../../../bin/ui/passphrase-prompt.js'; + +const tmpPath = join(tmpdir(), `test-passphrase-${Date.now()}.txt`); + +afterEach(async () => { + try { await unlink(tmpPath); } catch { /* may not exist */ } +}); + +describe('readPassphraseFile', () => { + it('reads from file and trims trailing newline', async () => { + await writeFile(tmpPath, 'my-secret\n', { mode: 0o600 }); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('my-secret'); + }); + + it('preserves content without trailing newline', async () => { + await writeFile(tmpPath, 'no-newline', { mode: 0o600 }); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('no-newline'); + }); + + it('preserves internal newlines', async () => { + await writeFile(tmpPath, 'line1\nline2\n', { mode: 0o600 }); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('line1\nline2'); + }); + + it('strips trailing CRLF (Windows line ending)', async () => { + await writeFile(tmpPath, 'win-secret\r\n', { mode: 0o600 }); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('win-secret'); + }); +}); + +describe('readPassphraseFile — empty passphrase rejection', () => { + it('rejects empty (0-byte) file', async () => { + await writeFile(tmpPath, '', { mode: 0o600 }); + await expect(readPassphraseFile(tmpPath)).rejects.toThrow('Passphrase must not be empty'); + }); + + it('rejects file containing only LF', async () => { + await writeFile(tmpPath, '\n', { mode: 0o600 }); + await expect(readPassphraseFile(tmpPath)).rejects.toThrow('Passphrase must not be empty'); + }); + + it('rejects file containing only CRLF', async () => { + await writeFile(tmpPath, '\r\n', { mode: 0o600 }); + await expect(readPassphraseFile(tmpPath)).rejects.toThrow('Passphrase must not be empty'); + }); +}); + +describe('readPassphraseFile — permission warnings', () => { + it('warns on group/world-readable file permissions', async () => { + const writeSpy = []; + const origWrite = process.stderr.write; + process.stderr.write = (/** @type {any} */ chunk) => { writeSpy.push(String(chunk)); return true; }; + try { + await writeFile(tmpPath, 'secret\n'); + await chmod(tmpPath, 0o644); + await readPassphraseFile(tmpPath); + expect(writeSpy.some((s) => s.includes('permissions'))).toBe(true); + } finally { + process.stderr.write = origWrite; + } + }); + + it('no warning for restricted file permissions', async () => { + const writeSpy = []; + const origWrite = process.stderr.write; + process.stderr.write = (/** @type {any} */ chunk) => { writeSpy.push(String(chunk)); return true; }; + try { + await writeFile(tmpPath, 'secret\n'); + await chmod(tmpPath, 0o600); + await readPassphraseFile(tmpPath); + expect(writeSpy.some((s) => s.includes('permissions'))).toBe(false); + } finally { + process.stderr.write = origWrite; + } + }); +}); diff --git a/test/unit/domain/errors/CasError.test.js b/test/unit/domain/errors/CasError.test.js new file mode 100644 index 0000000..9f7fb99 --- /dev/null +++ b/test/unit/domain/errors/CasError.test.js @@ -0,0 +1,37 @@ +import { describe, it, expect } from 'vitest'; +import CasError from '../../../../src/domain/errors/CasError.js'; + +describe('CasError', () => { + it('sets name, code, and meta properties', () => { + const err = new CasError('boom', 'TEST_CODE', { foo: 'bar' }); + expect(err.name).toBe('CasError'); + expect(err.message).toBe('boom'); + expect(err.code).toBe('TEST_CODE'); + expect(err.meta).toEqual({ foo: 'bar' }); + }); + + it('defaults meta to empty object', () => { + const err = new CasError('msg', 'CODE'); + expect(err.meta).toEqual({}); + }); + + it('is an instance of Error', () => { + const err = new CasError('msg', 'CODE'); + expect(err).toBeInstanceOf(Error); + }); + + it('constructs correctly when Error.captureStackTrace is unavailable', () => { + const original = Error.captureStackTrace; + Error.captureStackTrace = undefined; + try { + const err = new CasError('no-stack', 'NO_STACK', { x: 1 }); + expect(err.name).toBe('CasError'); + expect(err.code).toBe('NO_STACK'); + expect(err.meta).toEqual({ x: 1 }); + expect(err.message).toBe('no-stack'); + expect(err).toBeInstanceOf(Error); + } finally { + Error.captureStackTrace = original; + } + }); +}); diff --git a/test/unit/domain/services/CasService.chunkSizeBound.test.js b/test/unit/domain/services/CasService.chunkSizeBound.test.js new file mode 100644 index 0000000..05d6c9b --- /dev/null +++ b/test/unit/domain/services/CasService.chunkSizeBound.test.js @@ -0,0 +1,44 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import SilentObserver from '../../../../src/infrastructure/adapters/SilentObserver.js'; + +const testCrypto = await getTestCryptoAdapter(); + +const MiB = 1024 * 1024; + +function makeService(chunkSize, observability) { + return new CasService({ + persistence: { writeBlob: vi.fn(), writeTree: vi.fn(), readBlob: vi.fn() }, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize, + observability: observability || new SilentObserver(), + }); +} + +describe('CasService — chunk size upper bound', () => { + it('throws when chunkSize > 100 MiB', () => { + expect(() => makeService(100 * MiB + 1)).toThrow(/must not exceed/i); + }); + + it('accepts exactly 100 MiB', () => { + const service = makeService(100 * MiB); + expect(service.chunkSize).toBe(100 * MiB); + }); + + it('warns when chunkSize > 10 MiB', () => { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + makeService(11 * MiB, observability); + expect(observability.log).toHaveBeenCalledWith( + 'warn', + expect.stringContaining('exceeds 10 MiB'), + expect.objectContaining({ chunkSize: 11 * MiB }), + ); + }); +}); diff --git a/test/unit/domain/services/CasService.dedupWarning.test.js b/test/unit/domain/services/CasService.dedupWarning.test.js new file mode 100644 index 0000000..d2d0342 --- /dev/null +++ b/test/unit/domain/services/CasService.dedupWarning.test.js @@ -0,0 +1,65 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import CdcChunker from '../../../../src/infrastructure/chunkers/CdcChunker.js'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function makeObserver() { + return { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; +} + +function makeService(chunker, observability) { + return new CasService({ + persistence: { writeBlob: vi.fn().mockResolvedValue('oid'), writeTree: vi.fn(), readBlob: vi.fn() }, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + chunker, + }); +} + +describe('CasService — CDC + encryption dedup warning', () => { + it('emits warning when encryption + CDC', async () => { + const obs = makeObserver(); + const service = makeService(new CdcChunker({ minChunkSize: 1024, targetChunkSize: 2048, maxChunkSize: 4096 }), obs); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'enc-cdc', filename: 'f.bin', encryptionKey: key }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(1); + expect(warnCalls[0][2]).toEqual({ strategy: 'cdc' }); + }); + + it('does NOT warn for encryption + fixed chunking', async () => { + const obs = makeObserver(); + const service = makeService(new FixedChunker({ chunkSize: 1024 }), obs); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'enc-fixed', filename: 'f.bin', encryptionKey: key }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(0); + }); + + it('does NOT warn for CDC without encryption', async () => { + const obs = makeObserver(); + const service = makeService(new CdcChunker({ minChunkSize: 1024, targetChunkSize: 2048, maxChunkSize: 4096 }), obs); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'plain-cdc', filename: 'f.bin' }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(0); + }); +}); diff --git a/test/unit/domain/services/CasService.errors.test.js b/test/unit/domain/services/CasService.errors.test.js index ef1d13a..c06172c 100644 --- a/test/unit/domain/services/CasService.errors.test.js +++ b/test/unit/domain/services/CasService.errors.test.js @@ -26,13 +26,13 @@ describe('CasService – constructor – chunkSize validation', () => { it('throws when chunkSize is 0', () => { expect( () => new CasService({ persistence: mockPersistence, crypto: testCrypto, codec: new JsonCodec(), chunkSize: 0, observability: new SilentObserver() }), - ).toThrow('Chunk size must be at least 1024 bytes'); + ).toThrow('Chunk size must be an integer >= 1024 bytes'); }); it('throws when chunkSize is 512', () => { expect( () => new CasService({ persistence: mockPersistence, crypto: testCrypto, codec: new JsonCodec(), chunkSize: 512, observability: new SilentObserver() }), - ).toThrow('Chunk size must be at least 1024 bytes'); + ).toThrow('Chunk size must be an integer >= 1024 bytes'); }); it('accepts chunkSize of exactly 1024', () => { diff --git a/test/unit/domain/services/CasService.kdfBruteForce.test.js b/test/unit/domain/services/CasService.kdfBruteForce.test.js new file mode 100644 index 0000000..063893b --- /dev/null +++ b/test/unit/domain/services/CasService.kdfBruteForce.test.js @@ -0,0 +1,106 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import Manifest from '../../../../src/domain/value-objects/Manifest.js'; + +const testCrypto = await getTestCryptoAdapter(); + +const CHUNK_DATA = Buffer.alloc(128, 0xaa); +const CHUNK_DIGEST = await testCrypto.sha256(CHUNK_DATA); + +function setup() { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const mockPersistence = { + writeBlob: vi.fn(), + writeTree: vi.fn(), + readBlob: vi.fn().mockResolvedValue(CHUNK_DATA), + readTree: vi.fn(), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { service, observability }; +} + +function encryptedManifest(slug) { + return new Manifest({ + slug, + filename: `${slug}.bin`, + size: 128, + chunks: [ + { index: 0, size: 128, digest: CHUNK_DIGEST, blob: 'blob-0' }, + ], + encryption: { + algorithm: 'aes-256-gcm', + nonce: 'deadbeef', + tag: 'cafebabe', + encrypted: true, + }, + }); +} + +describe('16.12: KDF brute-force — decryption_failed metric', () => { + it('emits metric on wrong key', async () => { + const { service, observability } = setup(); + const manifest = encryptedManifest('secret-file'); + const wrongKey = testCrypto.randomBytes(32); + + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + expect.unreachable('should have thrown'); + } catch (err) { + expect(err.code).toBe('INTEGRITY_ERROR'); + } + + const dfMetrics = observability.metric.mock.calls.filter( + (c) => c[0] === 'error' && c[1].action === 'decryption_failed', + ); + expect(dfMetrics.length).toBe(1); + }); + + it('includes slug context for audit trail', async () => { + const { service, observability } = setup(); + const manifest = encryptedManifest('audit-slug'); + const wrongKey = testCrypto.randomBytes(32); + + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + } catch { + // expected + } + + const dfMetrics = observability.metric.mock.calls.filter( + (c) => c[0] === 'error' && c[1].action === 'decryption_failed', + ); + expect(dfMetrics[0][1]).toHaveProperty('slug', 'audit-slug'); + }); +}); + +describe('16.12: KDF brute-force — library rate-limiting', () => { + it('library API does NOT rate-limit', async () => { + const { service } = setup(); + const manifest = encryptedManifest('rate-test'); + const wrongKey = testCrypto.randomBytes(32); + + const start = Date.now(); + let caught; + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + expect.unreachable('should have thrown INTEGRITY_ERROR'); + } catch (err) { + caught = err; + } + const elapsed = Date.now() - start; + expect(caught?.code).toBe('INTEGRITY_ERROR'); + expect(elapsed).toBeLessThan(500); + }); +}); diff --git a/test/unit/domain/services/CasService.lifecycle.test.js b/test/unit/domain/services/CasService.lifecycle.test.js new file mode 100644 index 0000000..acd7630 --- /dev/null +++ b/test/unit/domain/services/CasService.lifecycle.test.js @@ -0,0 +1,119 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import { digestOf } from '../../../helpers/crypto.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function makeChunk(index, seed, blobOid) { + return { index, size: 1024, digest: digestOf(seed), blob: blobOid }; +} + +function setup() { + const mockPersistence = { + writeBlob: vi.fn(), + writeTree: vi.fn(), + readBlob: vi.fn(), + readTree: vi.fn(), + }; + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { mockPersistence, observability, service }; +} + +function mockManifest(mockPersistence, manifest) { + const codec = new JsonCodec(); + mockPersistence.readTree.mockResolvedValue([ + { mode: '100644', type: 'blob', oid: 'mf-oid', name: 'manifest.json' }, + ]); + mockPersistence.readBlob.mockResolvedValue(codec.encode(manifest)); +} + +describe('16.7: inspectAsset (canonical name)', () => { + it('returns { slug, chunksOrphaned }', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-1', filename: 'f.bin', size: 2048, + chunks: [makeChunk(0, 'c0', 'b0'), makeChunk(1, 'c1', 'b1')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.inspectAsset({ treeOid: 'tree-1' }); + expect(result).toEqual({ slug: 'asset-1', chunksOrphaned: 2 }); + }); +}); + +describe('16.7: deleteAsset (deprecated alias)', () => { + it('delegates to inspectAsset and returns same result', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-2', filename: 'g.bin', size: 1024, + chunks: [makeChunk(0, 'd0', 'b0')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.deleteAsset({ treeOid: 'tree-2' }); + expect(result).toEqual({ slug: 'asset-2', chunksOrphaned: 1 }); + }); + + it('emits deprecation warning via observability', async () => { + const { service, mockPersistence, observability } = setup(); + const manifest = { + slug: 'x', filename: 'x.bin', size: 0, chunks: [], + }; + mockManifest(mockPersistence, manifest); + await service.deleteAsset({ treeOid: 'tree-x' }); + expect(observability.log).toHaveBeenCalledWith( + 'warn', 'deleteAsset() is deprecated — use inspectAsset()', + ); + }); +}); + +describe('16.7: collectReferencedChunks (canonical name)', () => { + it('returns { referenced, total }', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-3', filename: 'h.bin', size: 2048, + chunks: [makeChunk(0, 'e0', 'b0'), makeChunk(1, 'e1', 'b1')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.collectReferencedChunks({ treeOids: ['tree-3'] }); + expect(result.referenced.size).toBe(2); + expect(result.total).toBe(2); + }); +}); + +describe('16.7: findOrphanedChunks (deprecated alias)', () => { + it('delegates to collectReferencedChunks', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-4', filename: 'i.bin', size: 1024, + chunks: [makeChunk(0, 'f0', 'b0')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.findOrphanedChunks({ treeOids: ['tree-4'] }); + expect(result.referenced.size).toBe(1); + expect(result.total).toBe(1); + }); + + it('emits deprecation warning via observability', async () => { + const { service, mockPersistence, observability } = setup(); + const manifest = { + slug: 'y', filename: 'y.bin', size: 0, chunks: [], + }; + mockManifest(mockPersistence, manifest); + await service.findOrphanedChunks({ treeOids: ['tree-y'] }); + expect(observability.log).toHaveBeenCalledWith( + 'warn', 'findOrphanedChunks() is deprecated — use collectReferencedChunks()', + ); + }); +}); diff --git a/test/unit/domain/services/CasService.orphanedBlobs.test.js b/test/unit/domain/services/CasService.orphanedBlobs.test.js new file mode 100644 index 0000000..5903ccd --- /dev/null +++ b/test/unit/domain/services/CasService.orphanedBlobs.test.js @@ -0,0 +1,96 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function failingSource(chunksBeforeError, chunkSize = 1024) { + let yielded = 0; + return { + [Symbol.asyncIterator]() { + return { + async next() { + if (yielded >= chunksBeforeError) { + throw new Error('simulated stream failure'); + } + yielded++; + return { value: Buffer.alloc(chunkSize, 0xaa), done: false }; + }, + }; + }, + }; +} + +function buildService() { + let blobCounter = 0; + const mockPersistence = { + writeBlob: vi.fn().mockImplementation(() => Promise.resolve(`blob-${blobCounter++}`)), + writeTree: vi.fn().mockResolvedValue('mock-tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.from('data')), + }; + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { service, mockPersistence, observability }; +} + +describe('CasService — orphaned blob tracking in STREAM_ERROR', () => { + let service; + let observability; + + beforeEach(() => { + ({ service, observability } = buildService()); + }); + + it('STREAM_ERROR meta includes orphanedBlobs array', async () => { + try { + await service.store({ source: failingSource(3), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.code).toBe('STREAM_ERROR'); + expect(Array.isArray(err.meta.orphanedBlobs)).toBe(true); + } + }); + + it('orphanedBlobs contain OIDs from successful writes', async () => { + try { + await service.store({ source: failingSource(3), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.meta.orphanedBlobs.length).toBe(3); + expect(err.meta.orphanedBlobs).toContain('blob-0'); + expect(err.meta.orphanedBlobs).toContain('blob-1'); + expect(err.meta.orphanedBlobs).toContain('blob-2'); + } + }); + + it('empty array when stream fails before any writes', async () => { + try { + await service.store({ source: failingSource(0), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.meta.orphanedBlobs).toEqual([]); + } + }); + + it('emits metric with orphaned blob count', async () => { + try { + await service.store({ source: failingSource(2), slug: 'fail', filename: 'f.bin' }); + } catch { + // expected + } + const errorMetrics = observability.metric.mock.calls.filter((c) => c[0] === 'error'); + expect(errorMetrics.length).toBeGreaterThan(0); + expect(errorMetrics[0][1]).toHaveProperty('orphanedBlobs', 2); + }); +}); diff --git a/test/unit/domain/services/CasService.restoreGuard.test.js b/test/unit/domain/services/CasService.restoreGuard.test.js new file mode 100644 index 0000000..f4402db --- /dev/null +++ b/test/unit/domain/services/CasService.restoreGuard.test.js @@ -0,0 +1,180 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import CasError from '../../../../src/domain/errors/CasError.js'; +import SilentObserver from '../../../../src/infrastructure/adapters/SilentObserver.js'; +import Manifest from '../../../../src/domain/value-objects/Manifest.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function setup({ maxRestoreBufferSize } = {}) { + const mockPersistence = { + writeBlob: vi.fn().mockResolvedValue('mock-blob-oid'), + writeTree: vi.fn().mockResolvedValue('mock-tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.alloc(1024, 0xaa)), + readTree: vi.fn(), + }; + const opts = { + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability: new SilentObserver(), + }; + if (maxRestoreBufferSize !== undefined) { + opts.maxRestoreBufferSize = maxRestoreBufferSize; + } + const service = new CasService(opts); + return { mockPersistence, service }; +} + +function makeEncryptedManifest(chunkSizes) { + const chunks = chunkSizes.map((size, i) => ({ + index: i, + size, + digest: 'a'.repeat(64), + blob: `blob-${i}`, + })); + return new Manifest({ + slug: 'test', + filename: 'test.bin', + size: chunkSizes.reduce((a, b) => a + b, 0), + chunks, + encryption: { + algorithm: 'aes-256-gcm', + nonce: Buffer.alloc(12).toString('base64'), + tag: Buffer.alloc(16).toString('base64'), + encrypted: true, + }, + }); +} + +describe('CasService — RESTORE_TOO_LARGE throws on exceed', () => { + it('throws RESTORE_TOO_LARGE when chunk sizes exceed limit', async () => { + const { service } = setup({ maxRestoreBufferSize: 2000 }); + const manifest = makeEncryptedManifest([1024, 1024, 1024]); + + await expect( + service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(), + ).rejects.toThrow(CasError); + + try { + await service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(); + } catch (err) { + expect(err.code).toBe('RESTORE_TOO_LARGE'); + expect(err.meta.size).toBe(3072); + expect(err.meta.limit).toBe(2000); + } + }); +}); + +describe('CasService — RESTORE_TOO_LARGE succeeds within limit', () => { + it('succeeds when within limit', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 4096 }); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(512, 0xaa); } + const manifest = await service.store({ source: source(), slug: 'ok', filename: 'ok.bin', encryptionKey: key }); + + const storedBlobArgs = mockPersistence.writeBlob.mock.calls.map((c) => c[0]); + let blobIdx = 0; + mockPersistence.readBlob.mockImplementation(() => Promise.resolve(storedBlobArgs[blobIdx++] || Buffer.alloc(0))); + + const chunks = []; + for await (const chunk of service.restoreStream({ manifest, encryptionKey: key })) { + chunks.push(chunk); + } + expect(chunks.length).toBeGreaterThan(0); + }); +}); + +describe('CasService — RESTORE_TOO_LARGE defaults and meta', () => { + it('default maxRestoreBufferSize is 512 MiB', () => { + const { service } = setup(); + expect(service.maxRestoreBufferSize).toBe(512 * 1024 * 1024); + }); + + it('error meta includes size and limit', async () => { + const { service } = setup({ maxRestoreBufferSize: 2048 }); + const manifest = makeEncryptedManifest([1100, 1100]); + + try { + await service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(); + expect.unreachable('should have thrown RESTORE_TOO_LARGE'); + } catch (err) { + expect(err.code).toBe('RESTORE_TOO_LARGE'); + expect(err.meta).toHaveProperty('size', 2200); + expect(err.meta).toHaveProperty('limit', 2048); + } + }); +}); + +describe('CasService — RESTORE_TOO_LARGE after decompression', () => { + it('throws when decompressed size exceeds limit', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 4096 }); + const key = Buffer.alloc(32, 0xab); + + // Store a small encrypted+compressed manifest that fits pre-decompression + async function* source() { yield Buffer.alloc(2048, 0xaa); } + const manifest = await service.store({ + source: source(), slug: 'bomb', filename: 'bomb.bin', + encryptionKey: key, compression: { algorithm: 'gzip' }, + }); + + // Wire readBlob to return the stored blobs + const storedBlobs = mockPersistence.writeBlob.mock.calls.map((c) => c[0]); + let idx = 0; + mockPersistence.readBlob.mockImplementation(() => Promise.resolve(storedBlobs[idx++] || Buffer.alloc(0))); + + // Mock _decompress to return a buffer larger than the limit + service._decompress = vi.fn().mockResolvedValue(Buffer.alloc(8192, 0xbb)); + + await expect( + service.restoreStream({ manifest, encryptionKey: key }).next(), + ).rejects.toMatchObject({ code: 'RESTORE_TOO_LARGE' }); + }); +}); + +describe('CasService — maxRestoreBufferSize validation', () => { + it('throws for non-integer', () => { + expect(() => setup({ maxRestoreBufferSize: 1.5 })).toThrow(); + }); + + it('throws for value below 1024', () => { + expect(() => setup({ maxRestoreBufferSize: 512 })).toThrow(); + }); + + it('throws for NaN', () => { + expect(() => setup({ maxRestoreBufferSize: NaN })).toThrow(); + }); + + it('accepts 1024', () => { + const { service } = setup({ maxRestoreBufferSize: 1024 }); + expect(service.maxRestoreBufferSize).toBe(1024); + }); +}); + +describe('CasService — RESTORE_TOO_LARGE does not affect streaming', () => { + it('does not apply to unencrypted/uncompressed restoreStream', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 1024 }); + const manifest = new Manifest({ + slug: 'plain', + filename: 'plain.bin', + size: 2048, + chunks: [ + { index: 0, size: 1024, digest: 'a'.repeat(64), blob: 'blob-0' }, + { index: 1, size: 1024, digest: 'a'.repeat(64), blob: 'blob-1' }, + ], + }); + + mockPersistence.readBlob.mockResolvedValue(Buffer.alloc(1024, 0xcc)); + service._sha256 = vi.fn().mockResolvedValue('a'.repeat(64)); + + const chunks = []; + for await (const chunk of service.restoreStream({ manifest })) { + chunks.push(chunk); + } + expect(chunks).toHaveLength(2); + }); +}); diff --git a/test/unit/domain/services/VaultService.encryptionCount.test.js b/test/unit/domain/services/VaultService.encryptionCount.test.js new file mode 100644 index 0000000..c349149 --- /dev/null +++ b/test/unit/domain/services/VaultService.encryptionCount.test.js @@ -0,0 +1,95 @@ +import { describe, it, expect, vi } from 'vitest'; +import VaultService from '../../../../src/domain/services/VaultService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function encryptedMetadata(overrides = {}) { + return { + version: 1, + encryption: { + cipher: 'aes-256-gcm', + kdf: { algorithm: 'pbkdf2', salt: 'c2FsdA==', iterations: 100000, keyLength: 32 }, + }, + ...overrides, + }; +} + +function setup(metadata = encryptedMetadata()) { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const persistence = { + writeBlob: vi.fn().mockResolvedValue('blob-oid'), + writeTree: vi.fn().mockResolvedValue('tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.from(JSON.stringify(metadata))), + readTree: vi.fn().mockResolvedValue([ + { mode: '100644', type: 'blob', oid: 'meta-oid', name: '.vault.json' }, + ]), + }; + const ref = { + resolveRef: vi.fn().mockResolvedValue('commit-oid'), + resolveTree: vi.fn().mockResolvedValue('root-tree-oid'), + createCommit: vi.fn().mockResolvedValue('new-commit-oid'), + updateRef: vi.fn().mockResolvedValue(undefined), + }; + const vault = new VaultService({ + persistence, ref, crypto: testCrypto, observability, + }); + return { vault, persistence, ref, observability }; +} + +describe('16.13: Nonce usage tracking — encryptionCount', () => { + it('vault metadata includes encryptionCount after add', async () => { + const { vault, persistence } = setup(); + await vault.addToVault({ slug: 'asset-1', treeOid: 'tree-1' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata).toHaveProperty('encryptionCount', 1); + }); + + it('encryptionCount increments per encrypted store', async () => { + const meta = encryptedMetadata({ encryptionCount: 5 }); + const { vault, persistence } = setup(meta); + await vault.addToVault({ slug: 'asset-2', treeOid: 'tree-2' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata.encryptionCount).toBe(6); + }); +}); + +describe('16.13: Nonce usage tracking — threshold warning', () => { + it('warns when encryptionCount exceeds threshold', async () => { + const threshold = VaultService.ENCRYPTION_COUNT_WARN; + const meta = encryptedMetadata({ encryptionCount: threshold - 1 }); + const { vault, observability } = setup(meta); + await vault.addToVault({ slug: 'asset-3', treeOid: 'tree-3' }); + + const warnCalls = observability.log.mock.calls.filter( + (c) => c[0] === 'warn' && c[1].includes('encryption count'), + ); + expect(warnCalls.length).toBe(1); + }); + + it('no warning below threshold', async () => { + const meta = encryptedMetadata({ encryptionCount: 0 }); + const { vault, observability } = setup(meta); + await vault.addToVault({ slug: 'asset-4', treeOid: 'tree-4' }); + + const warnCalls = observability.log.mock.calls.filter( + (c) => c[0] === 'warn' && c[1].includes('encryption count'), + ); + expect(warnCalls.length).toBe(0); + }); + + it('no counter increment for unencrypted vault', async () => { + const meta = { version: 1 }; + const { vault, persistence } = setup(meta); + await vault.addToVault({ slug: 'plain-1', treeOid: 'tree-p' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata).not.toHaveProperty('encryptionCount'); + }); +}); diff --git a/test/unit/domain/services/rotateVaultPassphrase.test.js b/test/unit/domain/services/rotateVaultPassphrase.test.js index 4539557..a16c9cc 100644 --- a/test/unit/domain/services/rotateVaultPassphrase.test.js +++ b/test/unit/domain/services/rotateVaultPassphrase.test.js @@ -34,7 +34,7 @@ async function createDeps(repoDir) { const service = new CasService({ persistence, codec: new JsonCodec(), crypto, observability: new SilentObserver(), chunkSize: 1024, }); - const vault = new VaultService({ persistence, ref, crypto }); + const vault = new VaultService({ persistence, ref, crypto, observability: new SilentObserver() }); return { service, vault }; } diff --git a/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js b/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js new file mode 100644 index 0000000..3361a41 --- /dev/null +++ b/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js @@ -0,0 +1,55 @@ +import { describe, it, expect } from 'vitest'; +import NodeCryptoAdapter from '../../../../src/infrastructure/adapters/NodeCryptoAdapter.js'; +import WebCryptoAdapter from '../../../../src/infrastructure/adapters/WebCryptoAdapter.js'; + +/** + * Conformance test suite that asserts identical behavioral contracts across + * all crypto adapters that can run in the current environment. + */ + +const adapters = [ + ['NodeCryptoAdapter', new NodeCryptoAdapter()], + ['WebCryptoAdapter', new WebCryptoAdapter()], +]; + +// BunCryptoAdapter is only available in Bun runtime — skip in Node/Deno +if (typeof globalThis.Bun !== 'undefined') { + const { default: BunCryptoAdapter } = await import( + '../../../../src/infrastructure/adapters/BunCryptoAdapter.js' + ); + adapters.push(['BunCryptoAdapter', new BunCryptoAdapter()]); +} + +describe.each(adapters)('%s conformance', (_name, adapter) => { + const key = Buffer.alloc(32, 0xab); + + it('encryptBuffer returns a Promise (thenable)', async () => { + const result = adapter.encryptBuffer(Buffer.from('hello'), key); + expect(typeof result.then).toBe('function'); + const { buf, meta } = await result; + expect(buf).toBeInstanceOf(Buffer); + expect(meta.encrypted).toBe(true); + }); + + it('decryptBuffer rejects INVALID_KEY_TYPE for string key', async () => { + const { buf, meta } = await adapter.encryptBuffer(Buffer.from('test'), key); + await expect( + Promise.resolve().then(() => adapter.decryptBuffer(buf, 'not-a-buffer', meta)), + ).rejects.toMatchObject({ code: 'INVALID_KEY_TYPE' }); + }); + + it('decryptBuffer rejects INVALID_KEY_LENGTH for 16-byte key', async () => { + const shortKey = Buffer.alloc(16, 0xcc); + const { buf, meta } = await adapter.encryptBuffer(Buffer.from('test'), key); + await expect( + Promise.resolve().then(() => adapter.decryptBuffer(buf, shortKey, meta)), + ).rejects.toMatchObject({ code: 'INVALID_KEY_LENGTH' }); + }); + + it('createEncryptionStream.finalize() throws STREAM_NOT_CONSUMED before consumption', () => { + const { finalize } = adapter.createEncryptionStream(key); + expect(() => finalize()).toThrow( + expect.objectContaining({ code: 'STREAM_NOT_CONSUMED' }), + ); + }); +}); diff --git a/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js b/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js new file mode 100644 index 0000000..2d9bf62 --- /dev/null +++ b/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js @@ -0,0 +1,85 @@ +import { describe, it, expect } from 'vitest'; +import WebCryptoAdapter from '../../../../src/infrastructure/adapters/WebCryptoAdapter.js'; +import NodeCryptoAdapter from '../../../../src/infrastructure/adapters/NodeCryptoAdapter.js'; +import CasError from '../../../../src/domain/errors/CasError.js'; + +const key = Buffer.alloc(32, 0xab); + +async function* makeSource(totalBytes, chunkSize = 1024) { + let remaining = totalBytes; + while (remaining > 0) { + const size = Math.min(chunkSize, remaining); + yield Buffer.alloc(size, 0xcc); + remaining -= size; + } +} + +async function consumeStream(encrypt, source) { + const chunks = []; + for await (const chunk of encrypt(source)) { + chunks.push(chunk); + } + return chunks; +} + +describe('WebCryptoAdapter — ENCRYPTION_BUFFER_EXCEEDED', () => { + it('throws ENCRYPTION_BUFFER_EXCEEDED when data exceeds limit', async () => { + const adapter = new WebCryptoAdapter({ maxEncryptionBufferSize: 2000 }); + const { encrypt } = adapter.createEncryptionStream(key); + + await expect( + consumeStream(encrypt, makeSource(3000)), + ).rejects.toThrow(CasError); + + try { + const adapter2 = new WebCryptoAdapter({ maxEncryptionBufferSize: 2000 }); + const { encrypt: encrypt2 } = adapter2.createEncryptionStream(key); + await consumeStream(encrypt2, makeSource(3000)); + } catch (err) { + expect(err.code).toBe('ENCRYPTION_BUFFER_EXCEEDED'); + expect(err.meta.limit).toBe(2000); + } + }); + + it('succeeds within limit', async () => { + const adapter = new WebCryptoAdapter({ maxEncryptionBufferSize: 4096 }); + const { encrypt, finalize } = adapter.createEncryptionStream(key); + + const chunks = await consumeStream(encrypt, makeSource(1024)); + expect(chunks.length).toBeGreaterThan(0); + + const meta = finalize(); + expect(meta.encrypted).toBe(true); + }); +}); + +describe('WebCryptoAdapter — maxEncryptionBufferSize validation', () => { + it('throws for NaN', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: NaN })).toThrow(RangeError); + }); + + it('throws for 0', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: 0 })).toThrow(RangeError); + }); + + it('throws for negative', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: -1 })).toThrow(RangeError); + }); + + it('throws for Infinity', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: Infinity })).toThrow(RangeError); + }); +}); + +describe('NodeCryptoAdapter — no buffer guard for streaming', () => { + it('does NOT throw for same-size stream (true streaming)', async () => { + const adapter = new NodeCryptoAdapter(); + const { encrypt, finalize } = adapter.createEncryptionStream(key); + + const chunks = await consumeStream(encrypt, makeSource(3000)); + expect(chunks.length).toBeGreaterThan(0); + + const meta = finalize(); + expect(meta.encrypted).toBe(true); + }); +}); diff --git a/test/unit/infrastructure/chunkers/ChunkerBounds.test.js b/test/unit/infrastructure/chunkers/ChunkerBounds.test.js new file mode 100644 index 0000000..2a5a2d6 --- /dev/null +++ b/test/unit/infrastructure/chunkers/ChunkerBounds.test.js @@ -0,0 +1,58 @@ +import { describe, it, expect } from 'vitest'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; +import CdcChunker from '../../../../src/infrastructure/chunkers/CdcChunker.js'; + +const MiB = 1024 * 1024; + +describe('FixedChunker — chunk size upper bound', () => { + it('throws when chunkSize > 100 MiB', () => { + expect(() => new FixedChunker({ chunkSize: 100 * MiB + 1 })).toThrow(RangeError); + }); + + it('accepts exactly 100 MiB', () => { + const chunker = new FixedChunker({ chunkSize: 100 * MiB }); + expect(chunker.params.chunkSize).toBe(100 * MiB); + }); +}); + +describe('FixedChunker — chunk size lower bound', () => { + it('throws when chunkSize is 0', () => { + expect(() => new FixedChunker({ chunkSize: 0 })).toThrow(RangeError); + }); + + it('throws when chunkSize is negative', () => { + expect(() => new FixedChunker({ chunkSize: -1 })).toThrow(RangeError); + }); + + it('throws when chunkSize is NaN', () => { + expect(() => new FixedChunker({ chunkSize: NaN })).toThrow(RangeError); + }); + + it('throws when chunkSize is not an integer', () => { + expect(() => new FixedChunker({ chunkSize: 1.5 })).toThrow(RangeError); + }); + + it('accepts chunkSize of 1', () => { + const chunker = new FixedChunker({ chunkSize: 1 }); + expect(chunker.params.chunkSize).toBe(1); + }); +}); + +describe('CdcChunker — chunk size upper bound', () => { + it('throws when maxChunkSize > 100 MiB', () => { + expect(() => new CdcChunker({ + maxChunkSize: 100 * MiB + 1, + minChunkSize: 1024, + targetChunkSize: 50 * MiB, + })).toThrow(RangeError); + }); + + it('accepts exactly 100 MiB as maxChunkSize', () => { + const chunker = new CdcChunker({ + maxChunkSize: 100 * MiB, + minChunkSize: 1024, + targetChunkSize: 50 * MiB, + }); + expect(chunker.params.max).toBe(100 * MiB); + }); +}); diff --git a/test/unit/infrastructure/chunkers/FixedChunker.test.js b/test/unit/infrastructure/chunkers/FixedChunker.test.js new file mode 100644 index 0000000..78233f7 --- /dev/null +++ b/test/unit/infrastructure/chunkers/FixedChunker.test.js @@ -0,0 +1,63 @@ +import { describe, it, expect } from 'vitest'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; + +async function* toAsyncIter(buffers) { + for (const b of buffers) { yield b; } +} + +async function collect(iter) { + const result = []; + for await (const chunk of iter) { result.push(chunk); } + return result; +} + +describe('16.4: FixedChunker pre-allocated buffer — regression', () => { + it('produces byte-exact output for a single large input', async () => { + const chunkSize = 64; + const chunker = new FixedChunker({ chunkSize }); + const input = Buffer.alloc(200); + for (let i = 0; i < input.length; i++) { input[i] = i & 0xff; } + + const chunks = await collect(chunker.chunk(toAsyncIter([input]))); + expect(chunks.map((c) => c.length)).toEqual([64, 64, 64, 8]); + expect(Buffer.concat(chunks).equals(input)).toBe(true); + }); + + it('exact multiple of chunkSize produces no partial', async () => { + const chunkSize = 128; + const chunker = new FixedChunker({ chunkSize }); + const input = Buffer.alloc(chunkSize * 3, 0xbb); + const chunks = await collect(chunker.chunk(toAsyncIter([input]))); + expect(chunks.length).toBe(3); + expect(chunks.every((c) => c.length === chunkSize)).toBe(true); + }); +}); + +describe('16.4: FixedChunker pre-allocated buffer — edge cases', () => { + it('many small input buffers reassemble correctly', async () => { + const chunkSize = 256; + const chunker = new FixedChunker({ chunkSize }); + const total = 1024; + const smallBufs = Array.from({ length: total }, (_, i) => Buffer.from([i & 0xff])); + + const chunks = await collect(chunker.chunk(toAsyncIter(smallBufs))); + expect(chunks.length).toBe(4); + const reassembled = Buffer.concat(chunks); + for (let i = 0; i < total; i++) { + expect(reassembled[i]).toBe(i & 0xff); + } + }); + + it('empty source produces no chunks', async () => { + const chunker = new FixedChunker({ chunkSize: 64 }); + const chunks = await collect(chunker.chunk(toAsyncIter([]))); + expect(chunks.length).toBe(0); + }); + + it('single byte produces one partial chunk', async () => { + const chunker = new FixedChunker({ chunkSize: 64 }); + const chunks = await collect(chunker.chunk(toAsyncIter([Buffer.from([42])]))); + expect(chunks.length).toBe(1); + expect(chunks[0]).toEqual(Buffer.from([42])); + }); +}); diff --git a/test/unit/vault/VaultService.test.js b/test/unit/vault/VaultService.test.js index a85e219..93d3697 100644 --- a/test/unit/vault/VaultService.test.js +++ b/test/unit/vault/VaultService.test.js @@ -34,11 +34,16 @@ function mockCrypto() { }; } +function mockObservability() { + return { metric: vi.fn(), log: vi.fn(), span: vi.fn().mockReturnValue({ end: vi.fn() }) }; +} + function createVault(overrides = {}) { return new VaultService({ persistence: overrides.persistence || mockPersistence(), ref: overrides.ref || mockRef(), crypto: overrides.crypto || mockCrypto(), + observability: overrides.observability || mockObservability(), }); }