This document explains the overall architecture of the StateWalker VCS monorepo, covering design philosophy, package organization, and how the components work together.
StateWalker VCS implements Git's object model and protocols entirely in TypeScript. Applications can read and write Git repositories without native Git binaries, making the library portable across Node.js, browsers, edge functions, and any JavaScript runtime.
The implementation produces identical output to native Git: same object IDs, same pack file formats, same protocol messages. Repositories created with StateWalker VCS work with standard Git tools and vice versa.
The VCS model separates concerns into three distinct layers:
┌─────────────────────────────────────────────────────────────┐
│ History (immutable) │
│ Content-addressed objects: blobs, trees, commits, tags │
│ References: branches, tags, HEAD │
├─────────────────────────────────────────────────────────────┤
│ Checkout (mutable local state) │
│ Staging area, HEAD pointer, operation state │
│ Stash, transformation state (merge, rebase, etc.) │
├─────────────────────────────────────────────────────────────┤
│ Worktree (filesystem access) │
│ Working directory read/write, file tracking │
│ Ignore patterns, file modes │
└─────────────────────────────────────────────────────────────┘
This separation enables:
- Multiple worktrees: Share history across working directories (like
git worktree) - Offline-first: History can sync independently of checkout state
- Testing flexibility: Swap filesystem for memory without affecting logic
Large repositories can contain files of any size. Rather than loading entire files into memory, all content flows through AsyncIterable<Uint8Array> streams. This design keeps memory consumption bounded regardless of file size.
The type system and algorithms align with Eclipse JGit, a mature Java implementation of Git. This alignment provides proven patterns for Git compatibility and clear precedent for edge cases.
┌──────────────────┐
│ commands │
│ (Git API) │
└────────┬─────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌──────────────┐
│ transport │ │ store-mem │ │ store-sql │
│ (protocols) │ │ (testing) │ │ (persist) │
└────────┬────────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└──────────────────┼─────────────────┘
│
▼
┌──────────────────┐
│ core │
│ (history, stores,│
│ workspace, packs)│
└────────┬─────────┘
│
▼
┌───────────┐
│ utils │
│(algorithms)│
└───────────┘
@statewalker/vcs-utils provides pure algorithmic implementations with zero VCS-specific dependencies:
- Cryptographic hashing (SHA-1, CRC32, rolling checksums)
- JGit-compatible rolling hash (16-byte blocks with T[]/U[] lookup tables)
- Compression (zlib via pako, optional Node.js native)
- Diff algorithms (Myers text diff, binary delta encoding)
- Git delta format encoding/decoding
- Varint encoding for pack files
- Streaming utilities
@statewalker/vcs-core defines the VCS contracts and object model:
- History interface (Blobs, Trees, Commits, Tags, Refs)
- Workspace interfaces (Staging, Checkout, Worktree)
- Storage abstractions (RawStorage, ChunkAccess, GitObjectStore)
- Pack file format and delta compression
- TransformationStore for operation state (merge, rebase, cherry-pick)
- ResolutionStore for conflict management with rerere support
Storage backends implement core interfaces for different systems:
| Package | Storage Target | Use Case |
|---|---|---|
@statewalker/vcs-core |
Git .git/ directory |
Native Git compatibility |
@statewalker/vcs-store-mem |
Memory | Testing, ephemeral repos |
@statewalker/vcs-store-sql |
SQLite | Server deployments |
@statewalker/vcs-store-kv |
Key-value stores | Custom backends |
@statewalker/vcs-sandbox |
Isolated storage | Safe experimentation |
@statewalker/vcs-transport implements Git's network protocols:
- HTTP smart protocol (v1 and v2)
- Pkt-line encoding
- Capability negotiation
- Pack transfer
- Server-side handlers (UploadPack, ReceivePack)
@statewalker/vcs-commands provides high-level operations:
- Clone, fetch, push
- Commit, checkout
- Branch management
- Merge, rebase, cherry-pick
The History interface provides unified access to all immutable repository objects:
interface History {
readonly blobs: Blobs; // File content (streaming)
readonly trees: Trees; // Directory snapshots
readonly commits: Commits; // Version history with ancestry
readonly tags: Tags; // Annotated tags
readonly refs: Refs; // Branch/tag pointers
initialize(): Promise<void>;
close(): Promise<void>;
collectReachableObjects(wants, exclude): AsyncIterable<ObjectId>;
}All object stores share a common base interface:
interface ObjectStorage<V> {
store(value: V): Promise<ObjectId>; // Store and get content-addressed ID
load(id: ObjectId): Promise<V | undefined>;
has(id: ObjectId): Promise<boolean>;
remove(id: ObjectId): Promise<boolean>;
keys(): AsyncIterable<ObjectId>;
}Built on ObjectStorage, each store adds domain-specific operations:
- Blobs: Raw file content with size queries
- Trees: Directory snapshots with entry lookup and empty tree ID
- Commits: Ancestry traversal, merge base detection, commit graph walking
- Tags: Annotated tags with target resolution (peeling)
- Refs: Named pointers with symbolic ref support
The workspace layer manages mutable local state:
interface Staging {
// Entry management
getEntry(path: string): StagingEntry | undefined;
setEntry(path: string, entry: StagingEntry): void;
removeEntry(path: string): boolean;
// Conflict handling
hasConflicts(): boolean;
getConflictedPaths(): string[];
resolveConflict(path: string, resolution: ConflictResolution): void;
// Tree operations
writeTree(): Promise<ObjectId>;
readTree(treeId: ObjectId): Promise<void>;
}interface Checkout {
readonly staging: Staging;
// HEAD management
getHead(): Promise<ObjectId | undefined>;
setHead(target: ObjectId | string): Promise<void>;
getCurrentBranch(): Promise<string | undefined>;
isDetached(): Promise<boolean>;
// Operation state
getOperationState(): Promise<CheckoutOperationState>;
hasOperationInProgress(): Promise<boolean>;
abortOperation(): Promise<void>;
}interface Worktree {
// File operations
readContent(path: string): AsyncIterable<Uint8Array>;
writeContent(path: string, content: AsyncIterable<Uint8Array>): Promise<void>;
exists(path: string): Promise<boolean>;
isIgnored(path: string): Promise<boolean>;
// Directory walking
walk(options?: WalkOptions): AsyncIterable<WalkEntry>;
// Checkout operations
checkoutTree(treeId: ObjectId): Promise<void>;
checkoutPaths(paths: string[], source: ObjectId): Promise<void>;
}The TransformationStore provides unified state management for multi-commit operations:
interface TransformationStore {
readonly merge: MergeStateStore;
readonly rebase: RebaseStateStore;
readonly cherryPick: CherryPickStateStore;
readonly revert: RevertStateStore;
readonly sequencer: SequencerStore;
readonly resolution?: ResolutionStore;
getState(): Promise<TransformationState | undefined>;
getCapabilities(): Promise<TransformationCapabilities>;
hasOperationInProgress(): Promise<boolean>;
abortCurrent(): Promise<void>;
}Conflict tracking with rerere-like functionality:
interface ResolutionStore {
// Conflict detection
getConflicts(): Promise<ConflictInfo[]>;
hasConflicts(): Promise<boolean>;
// Resolution workflow
markResolved(path: string, strategy: ResolutionStrategy): Promise<void>;
acceptOurs(path: string): Promise<void>;
acceptTheirs(path: string): Promise<void>;
// Rerere (reuse recorded resolution)
recordResolution(path: string): Promise<void>;
getSuggestedResolution(path: string): Promise<RecordedResolution | undefined>;
autoResolve(): Promise<string[]>;
}The HistoryWithOperations interface provides unified access to Git objects and storage operations:
┌─────────────────────────────────────────────────────────────┐
│ HistoryWithOperations │
│ Unified entry point for all storage operations │
├─────────────────────────────────────────────────────────────┤
│ History │ DeltaApi │ SerializationApi │
│ (object access) │ (compression) │ (pack handling) │
├─────────────────────┼─────────────────┼─────────────────────┤
│ blobs, trees │ BlobDeltaApi │ Pack encoding │
│ commits, tags │ Batch ops │ Object serializing │
│ refs │ Chain queries │ Import/export │
└─────────────────────────────────────────────────────────────┘
Use factory functions to create storage:
// From registered backend type
const history = await createHistory("git-files", { path: ".git" });
// Specific factories
const history = createMemoryHistoryWithOperations();
const history = createGitFilesHistory(config);The lowest layer provides key-value byte storage:
interface RawStorage {
store(key: string, content: AsyncIterable<Uint8Array>): Promise<void>;
load(key: string, options?: { start?: number; end?: number }): AsyncIterable<Uint8Array>;
has(key: string): Promise<boolean>;
remove(key: string): Promise<boolean>;
keys(): AsyncIterable<string>;
size(key: string): Promise<number>;
}Implementations:
- MemoryRawStorage: In-memory Map-based storage for testing
- FileRawStorage: Git-compatible two-level directory structure (XX/XXXXXX)
- CompressedRawStorage: Decorator adding zlib compression
- ChunkedRawStorage: Splits large objects into fixed-size chunks
For storage backends with size limits (like browser storage):
interface ChunkAccess {
storeChunk(key: string, index: number, data: Uint8Array): Promise<void>;
loadChunk(key: string, index: number): Promise<Uint8Array>;
getChunkCount(key: string): Promise<number>;
removeChunks(key: string): Promise<void>;
hasKey(key: string): Promise<boolean>;
keys(): AsyncIterable<string>;
}Every object's ID derives from its content via SHA-1:
SHA-1("<type> <size>\0<content>") = ObjectId
This provides:
- Automatic deduplication: Identical files produce identical IDs
- Integrity verification: Corrupted objects have wrong IDs
- Efficient sync: Only transfer objects not already present
- Immutability: Changing content changes the ID
Similar objects are stored as differences from a base object:
Source: [----A----][----B----][----C----]
Target: [----B----][--new--][----A----]
Delta:
COPY from source offset 10, length 10 (block B)
INSERT [--new--] (new data)
COPY from source offset 0, length 10 (block A)
Delta compression is applied only to blob objects:
Rationale:
- 90%+ of storage is blobs: File content dominates repository size
- Trees/commits are small: Typically < 1KB, delta overhead exceeds savings
- Simpler GC: No tree delta chains to manage during garbage collection
- Faster access: Commits/trees don't require delta reconstruction
The transport layer implements Git's smart HTTP protocol:
Client Server
│ │
│ ── GET /info/refs ──────────► │ Discover refs
│ ◄── refs + capabilities ───── │
│ │
│ ── POST git-upload-pack ────► │ Request objects
│ want <oid> │
│ have <oid> │
│ done │
│ ◄── pack data (sideband) ──── │ Receive pack
│ │
Features:
- Protocol v1 and v2 support
- Capability negotiation (multi_ack, thin-pack, side-band-64k)
- Shallow clone support
- Server implementation using Web Standard APIs (Request/Response)
Implement RawStorage for your storage system:
class MyRawStorage implements RawStorage {
async store(key: string, content: AsyncIterable<Uint8Array>): Promise<void> {
// Your storage logic
}
// ... other methods
}Then create a History using factory functions:
import { createHistoryWithOperations } from "@statewalker/vcs-core";
const history = await createHistoryWithOperations({ backend: myBackend });
await history.initialize();The compression layer supports pluggable implementations:
import { setCompressionUtils } from "@statewalker/vcs-utils/compression";
import { createNodeCompression } from "@statewalker/vcs-utils/compression-node";
setCompressionUtils(createNodeCompression()); // Use native zlibThe HTTP server supports flexible authentication:
const server = createGitHttpServer({
authenticate: async (request) => {
// Validate credentials
},
authorize: async (request, repo, operation) => {
// Check permissions for "fetch" or "push"
},
});The core packages work in browsers without polyfills:
- @statewalker/vcs-utils: Pure TypeScript algorithms
- @statewalker/vcs-core: Interface definitions and format handling
- @statewalker/vcs-transport: Web Standard APIs (fetch, Request/Response)
Storage backends may have platform requirements:
- core (Git storage): Requires FilesApi implementation
- store-sql: Requires SQLite (Node.js only)
- store-mem: Works everywhere
- store-kv: Works with any key-value backend (IndexedDB, LocalStorage, etc.)
Never buffer entire objects in memory. Use async generators:
async function* processContent(input: AsyncIterable<Uint8Array>) {
for await (const chunk of input) {
yield transform(chunk);
}
}Load objects only when needed. The has() method checks existence without loading content.
Configure chain depth to balance compression ratio against reconstruction cost. Deep chains save space but slow random access.
Large repositories benefit from periodic repacking to optimize delta relationships and reduce file count.