docs: GPU strategy doc and north-star plan update by ChrisLundquist · Pull Request #102 · ChrisLundquist/libpz

ChrisLundquist · 2026-03-02T00:21:30Z

Summary

Add docs/design-docs/gpu-strategy.md — a single document capturing the GPU compression strategy based on empirical results across the project's history
Update PLAN-unified-scheduler-north-star.md to reflect current reality (Phase 1 done but perf gate failed, Phase 3 done via PR Unify GPU schedulers into a single coordinator thread #101)

What the strategy doc covers

What GPU is good at: LZ77 cooperative-stitch matching (1,788 parallel probes, 94% quality)
Why hash tables failed on GPU: atomic ordering destroys match quality (6.25% on repetitive data)
Compression ratio gap: PZ-LZR 41% vs gzip 29% — primarily a match-finding quality gap, not entropy
What GPU is bad at: entropy coding (rANS at 0.77x CPU encode, 0.54x decode — serial state machine bottleneck)
Current architecture: unified scheduler with GPU coordinator, try_send() deadlock prevention, GPU-to-CPU fallback
The FusedGpu problem: routing entropy to GPU is currently counterproductive
What would need to change: match quality improvements, on-device chaining (blocked on GPU entropy parity)

North-star plan changes

Removed stale "Critical gap: No GPU rANS kernels" (they've existed since Feb 17)
Added status table: Phase 1 DONE (perf gate FAIL), Phase 3 DONE (PR Unify GPU schedulers into a single coordinator thread #101), Phase 2/5 DEFERRED
Updated existing assets table with GPU rANS kernels
Added recommended next actions based on current evidence

Test plan

No code changes — docs only
Pre-commit hook passes (fmt, clippy, tests)

🤖 Generated with Claude Code

Add docs/design-docs/gpu-strategy.md documenting the GPU compression strategy: what GPU is good at (LZ77 parallel probes), what it's bad at (serial entropy coding), why hash tables failed, the compression ratio gap vs gzip, and the current unified scheduler architecture. Update PLAN-unified-scheduler-north-star.md to reflect reality: - Phase 1 (GPU rANS): kernels exist but perf gate FAILED (0.77x CPU) - Phase 3 (scheduler): DONE via PR #101 unified scheduler - Remove "Critical gap: No GPU rANS kernels" (they exist since Feb) - Add status table and recommended next actions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Key insights for future agents: - Ratio gap vs gzip is encoding efficiency, not match quality - GPU rANS kernels exist but are 0.77x CPU (don't re-implement) - GPU wins on LZ77, loses on entropy — FusedGpu is counterproductive - Per-stream frequency table overhead is real but small (~9% of gap) - LzSeq is the right pipeline family for ratio improvements - GPU hash tables don't work for LZ77 due to atomic ordering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document five approaches for future work: 1. dietGPU-style warp-per-segment rANS (proven, needs subgroup ops) 2. Huffman sync-point decode (simplest, plan exists) 3. Recoil-style arbitrary-position rANS decode (best ratio) 4. Sparse frequency tables (small but free win) 5. Match encoding improvements (zstd sequences, repeat offsets, etc.) Includes comparison table, WebGPU feasibility notes, and risk assessment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisLundquist force-pushed the claude/docs-gpu-strategy branch from 040354d to eb061d0 Compare March 2, 2026 00:27

Chris Lundquist and others added 2 commits March 1, 2026 17:25

ChrisLundquist merged commit 8ff3b19 into master Mar 2, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: GPU strategy doc and north-star plan update#102

docs: GPU strategy doc and north-star plan update#102
ChrisLundquist merged 3 commits intomasterfrom
claude/docs-gpu-strategy

ChrisLundquist commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Mar 2, 2026

Summary

What the strategy doc covers

North-star plan changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant