|
| 1 | +# CommitMind |
| 2 | + |
| 3 | +**Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).** |
| 4 | + |
| 5 | +> Stop searching by keywords. Search by *meaning*. |
| 6 | +
|
| 7 | +[](https://pypi.org/project/commitmind/) |
| 8 | +[](https://www.python.org/downloads/) |
| 9 | +[](LICENSE) |
| 10 | + |
| 11 | +## The Problem |
| 12 | + |
| 13 | +```bash |
| 14 | +# Current: keyword matching only |
| 15 | +git log --grep="memory leak" # Only finds commits with exact text "memory leak" |
| 16 | + # Misses: "fix kfree_skb double free" |
| 17 | + # Misses: "plug UAF in reset path" |
| 18 | + # Misses: "resolve dangling pointer" |
| 19 | +``` |
| 20 | + |
| 21 | +## The Solution |
| 22 | + |
| 23 | +```bash |
| 24 | +# CommitMind: semantic search |
| 25 | +commitmind search "memory leak" |
| 26 | +# >> #1 [0.94] a3f2c1d Fix kfree_skb double free in netfilter |
| 27 | +# >> #2 [0.91] b7e4a2f Plug use-after-free in device reset path |
| 28 | +# >> #3 [0.87] c9d1b3e Resolve dangling pointer in slab allocator |
| 29 | +``` |
| 30 | + |
| 31 | +CommitMind understands the **meaning** of your query and finds semantically related commits - even when the exact words don't match. |
| 32 | + |
| 33 | +## How It Works |
| 34 | + |
| 35 | +``` |
| 36 | +Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search |
| 37 | + (all-MiniLM-L6-v2) (7.6x compression) (asymmetric scoring) |
| 38 | +``` |
| 39 | + |
| 40 | +1. **Extract** commit messages + file change metadata from git history |
| 41 | +2. **Embed** each commit into a 384-dimensional vector (local model, no API needed) |
| 42 | +3. **Compress** vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings |
| 43 | +4. **Search** using asymmetric inner-product estimation (no decompression needed) |
| 44 | + |
| 45 | +## Installation |
| 46 | + |
| 47 | +```bash |
| 48 | +pip install commitmind |
| 49 | +``` |
| 50 | + |
| 51 | +Or install from source: |
| 52 | + |
| 53 | +```bash |
| 54 | +git clone https://github.com/wjddusrb03/commitmind.git |
| 55 | +cd commitmind |
| 56 | +pip install -e ".[dev]" |
| 57 | +``` |
| 58 | + |
| 59 | +## Quick Start |
| 60 | + |
| 61 | +```bash |
| 62 | +# 1. Index your repository |
| 63 | +cd your-project |
| 64 | +commitmind index |
| 65 | + |
| 66 | +# Output: |
| 67 | +# Indexing complete! |
| 68 | +# > 3,842 commits indexed |
| 69 | +# > Compressed: 18.2 MB -> 2.4 MB (7.6x) |
| 70 | +# > Saved to .commitmind/index.pkl |
| 71 | + |
| 72 | +# 2. Search by meaning |
| 73 | +commitmind search "authentication bug fix" |
| 74 | + |
| 75 | +# 3. View stats |
| 76 | +commitmind stats |
| 77 | +``` |
| 78 | + |
| 79 | +## CLI Commands |
| 80 | + |
| 81 | +| Command | Description | |
| 82 | +|---|---| |
| 83 | +| `commitmind index` | Index commits with TurboQuant compression | |
| 84 | +| `commitmind search "query"` | Semantic search over commits | |
| 85 | +| `commitmind stats` | Show index statistics | |
| 86 | +| `commitmind update` | Add new commits to existing index | |
| 87 | + |
| 88 | +### Options |
| 89 | + |
| 90 | +```bash |
| 91 | +# Index with options |
| 92 | +commitmind index --max-commits 1000 # Limit to recent 1000 commits |
| 93 | +commitmind index --branch main # Index specific branch |
| 94 | +commitmind index --bits 2 # Use 2-bit quantization (more compression) |
| 95 | + |
| 96 | +# Search with options |
| 97 | +commitmind search "query" -k 10 # Return top 10 results |
| 98 | +``` |
| 99 | + |
| 100 | +## Use Cases |
| 101 | + |
| 102 | +- **New team member**: "What authentication changes were made recently?" |
| 103 | +- **Bug tracking**: "Find commits related to network timeout issues" |
| 104 | +- **Security audit**: "Show all SQL injection related fixes" |
| 105 | +- **Code archaeology**: Search Linux kernel's 1M+ commits by meaning |
| 106 | +- **Cross-language**: Search English commits with Korean queries (and vice versa) |
| 107 | + |
| 108 | +## Memory Efficiency |
| 109 | + |
| 110 | +Thanks to TurboQuant compression: |
| 111 | + |
| 112 | +| Commits | Uncompressed | CommitMind | Savings | |
| 113 | +|---|---|---|---| |
| 114 | +| 1,000 | 1.5 MB | 0.2 MB | 87% | |
| 115 | +| 10,000 | 15 MB | 2.0 MB | 87% | |
| 116 | +| 100,000 | 150 MB | 20 MB | 87% | |
| 117 | +| 1,000,000 | 1.5 GB | 200 MB | 87% | |
| 118 | + |
| 119 | +## How TurboQuant Works |
| 120 | + |
| 121 | +CommitMind uses [TurboQuant](https://openreview.net/forum?id=mMWatwUUkn) (Google Research, ICLR 2026): |
| 122 | + |
| 123 | +1. **PolarQuant**: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit) |
| 124 | +2. **QJL**: Quantized Johnson-Lindenstrauss residual correction (1-bit) |
| 125 | +3. **Asymmetric scoring**: Compute similarity WITHOUT decompressing vectors |
| 126 | + |
| 127 | +This achieves ~7.6x compression with minimal accuracy loss. |
| 128 | + |
| 129 | +## Requirements |
| 130 | + |
| 131 | +- Python 3.9+ |
| 132 | +- Git repository |
| 133 | +- CPU only (no GPU required) |
| 134 | +- ~500 MB disk for embedding model (downloaded once) |
| 135 | + |
| 136 | +## Contributing |
| 137 | + |
| 138 | +Issues and pull requests are welcome! If you find a bug or have suggestions, please [open an issue](https://github.com/wjddusrb03/commitmind/issues). |
| 139 | + |
| 140 | +## License |
| 141 | + |
| 142 | +MIT License |
| 143 | + |
| 144 | +## Citation |
| 145 | + |
| 146 | +If you use CommitMind in your research: |
| 147 | + |
| 148 | +```bibtex |
| 149 | +@software{commitmind2026, |
| 150 | + title={CommitMind: Semantic Git Commit Search with TurboQuant Compression}, |
| 151 | + author={wjddusrb03}, |
| 152 | + year={2026}, |
| 153 | + url={https://github.com/wjddusrb03/commitmind} |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +## Related |
| 158 | + |
| 159 | +- [langchain-turboquant](https://github.com/wjddusrb03/langchain-turboquant) - LangChain VectorStore with TurboQuant compression |
| 160 | +- [TurboQuant paper](https://openreview.net/forum?id=mMWatwUUkn) - Original ICLR 2026 paper by Google Research |
0 commit comments