docs: GPU strategy doc and north-star plan update#102
Merged
ChrisLundquist merged 3 commits intomasterfrom Mar 2, 2026
Merged
Conversation
Add docs/design-docs/gpu-strategy.md documenting the GPU compression strategy: what GPU is good at (LZ77 parallel probes), what it's bad at (serial entropy coding), why hash tables failed, the compression ratio gap vs gzip, and the current unified scheduler architecture. Update PLAN-unified-scheduler-north-star.md to reflect reality: - Phase 1 (GPU rANS): kernels exist but perf gate FAILED (0.77x CPU) - Phase 3 (scheduler): DONE via PR #101 unified scheduler - Remove "Critical gap: No GPU rANS kernels" (they exist since Feb) - Add status table and recommended next actions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
040354d to
eb061d0
Compare
Key insights for future agents: - Ratio gap vs gzip is encoding efficiency, not match quality - GPU rANS kernels exist but are 0.77x CPU (don't re-implement) - GPU wins on LZ77, loses on entropy — FusedGpu is counterproductive - Per-stream frequency table overhead is real but small (~9% of gap) - LzSeq is the right pipeline family for ratio improvements - GPU hash tables don't work for LZ77 due to atomic ordering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document five approaches for future work: 1. dietGPU-style warp-per-segment rANS (proven, needs subgroup ops) 2. Huffman sync-point decode (simplest, plan exists) 3. Recoil-style arbitrary-position rANS decode (best ratio) 4. Sparse frequency tables (small but free win) 5. Match encoding improvements (zstd sequences, repeat offsets, etc.) Includes comparison table, WebGPU feasibility notes, and risk assessment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/design-docs/gpu-strategy.md— a single document capturing the GPU compression strategy based on empirical results across the project's historyPLAN-unified-scheduler-north-star.mdto reflect current reality (Phase 1 done but perf gate failed, Phase 3 done via PR Unify GPU schedulers into a single coordinator thread #101)What the strategy doc covers
try_send()deadlock prevention, GPU-to-CPU fallbackNorth-star plan changes
Test plan
🤖 Generated with Claude Code