A structured, tutorial-style knowledge base covering database internals — from storage engines and data models to distributed consensus, compaction strategies, and vector search.
graph LR
A[Database Foundations] --> B[Relational Databases]
A --> C[Document & Key-Value]
A --> D[Search & Vector]
A --> E[Time-Series]
A --> F[Distributed Internals]
A --> A1[Storage Engines]
A --> A2[Data Models]
A --> A3[Consistency & Guarantees]
B --> B1[PostgreSQL Internals]
B --> B2[Indexing Strategies]
C --> C1[MongoDB Internals]
C --> C2[etcd Internals]
D --> D1[Elasticsearch Internals]
D --> D2[Vector Database Internals]
E --> E1[Time-Series Fundamentals]
E --> E2[Gorilla & Prometheus]
F --> F1[Consensus Algorithms]
F --> F2[Replication & Sharding]
F --> F3[Compaction Strategies]
style A fill:#4a90d9,stroke:#2c5f8a,color:#fff
style B fill:#27ae60,stroke:#1e8449,color:#fff
style C fill:#f39c12,stroke:#c87f0a,color:#fff
style D fill:#7b68ee,stroke:#5a4cb5,color:#fff
style E fill:#e74c3c,stroke:#c0392b,color:#fff
style F fill:#9b59b6,stroke:#8e44ad,color:#fff
style A1 fill:#6aacf0,stroke:#4a8ad0,color:#fff
style A2 fill:#6aacf0,stroke:#4a8ad0,color:#fff
style A3 fill:#6aacf0,stroke:#4a8ad0,color:#fff
style B1 fill:#2ecc71,stroke:#27ae60,color:#fff
style B2 fill:#2ecc71,stroke:#27ae60,color:#fff
style C1 fill:#f5a962,stroke:#d68e4a,color:#fff
style C2 fill:#f5a962,stroke:#d68e4a,color:#fff
style D1 fill:#9a8aee,stroke:#7a6ace,color:#fff
style D2 fill:#9a8aee,stroke:#7a6ace,color:#fff
style E1 fill:#e89a9f,stroke:#c87a7f,color:#fff
style E2 fill:#e89a9f,stroke:#c87a7f,color:#fff
style F1 fill:#b07cc6,stroke:#9a6cb0,color:#fff
style F2 fill:#b07cc6,stroke:#9a6cb0,color:#fff
style F3 fill:#b07cc6,stroke:#9a6cb0,color:#fff
- Section Overview
- Storage Engines — B-trees vs LSM-trees, write/read/space amplification, and why storage engine choice is the most impactful architecture decision
- Data Models — Relational, document, key-value, columnar, and graph models with a decision framework for choosing
- Consistency & Guarantees — ACID vs BASE, CAP theorem limitations, PACELC, isolation levels, and what consistency means in practice
- Section Overview
- PostgreSQL Internals — MVCC, WAL, query planner, VACUUM, connection pooling, and partitioning
- Indexing Strategies — B-tree, GIN, GiST, BRIN, hash, partial, and expression indexes with an index selection decision tree
- Section Overview
- MongoDB Internals — WiredTiger storage engine, replica sets, sharding, aggregation pipeline, schema design patterns, and transactions
- etcd Internals — Raft consensus implementation, bbolt storage, watch mechanism, and why etcd powers Kubernetes
- Section Overview
- Elasticsearch Internals — Inverted indexes, BM25 scoring, distributed scatter-gather search, segment management, and shard sizing
- Vector Database Internals — HNSW, IVF, product quantization algorithms in depth; Qdrant and Milvus architectures compared
- Section Overview
- Time-Series Fundamentals — What makes time-series data different, compression techniques, indexing, retention strategies, and when to use a dedicated TSDB
- Gorilla & Prometheus — Facebook Gorilla's compression algorithms, Prometheus TSDB architecture, the Grafana ecosystem, and long-term storage options
- Section Overview
- Consensus Algorithms — Raft, Paxos, and ZAB — leader election, log replication, safety properties, Multi-Raft scaling, and production implementations
- Replication & Sharding — Single-leader, multi-leader, and leaderless replication; range vs hash sharding; consistent hashing; rebalancing strategies
- Compaction Strategies — STCS, LCS, TWCS, ICS, and Gorilla's block-based approach — the write-read-space amplification tradeoff
- Backend Engineers who want to understand the databases they use daily — why queries are slow, how replication works, what compaction does
- Infrastructure / Platform Engineers operating database clusters who need to reason about sharding, consensus, and compaction tuning
- Software Engineers evaluating database choices who want to go deeper than feature comparison tables
- Students & Researchers looking for a structured, opinionated guide through database internals literature
| Your Goal | Start Here |
|---|---|
| Understand how databases store data | Storage Engines |
| Choose between database types | Data Models |
| Reason about consistency tradeoffs | Consistency & Guarantees |
| Debug slow PostgreSQL queries | PostgreSQL Internals |
| Choose the right PostgreSQL index | Indexing Strategies |
| Understand MongoDB architecture | MongoDB Internals |
| Learn how Kubernetes stores state | etcd Internals |
| Build or improve full-text search | Elasticsearch Internals |
| Choose a vector database for RAG | Vector Database Internals |
| Design a monitoring stack | Gorilla & Prometheus |
| Understand distributed consensus | Consensus Algorithms |
| Design a sharding strategy | Replication & Sharding |
| Tune LSM-tree compaction | Compaction Strategies |