Skip to content

natenberenstein/deep-dive-databases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Deep Dive into Databases

A structured, tutorial-style knowledge base covering database internals — from storage engines and data models to distributed consensus, compaction strategies, and vector search.

Learning Roadmap

graph LR
    A[Database Foundations] --> B[Relational Databases]
    A --> C[Document & Key-Value]
    A --> D[Search & Vector]
    A --> E[Time-Series]
    A --> F[Distributed Internals]

    A --> A1[Storage Engines]
    A --> A2[Data Models]
    A --> A3[Consistency & Guarantees]

    B --> B1[PostgreSQL Internals]
    B --> B2[Indexing Strategies]

    C --> C1[MongoDB Internals]
    C --> C2[etcd Internals]

    D --> D1[Elasticsearch Internals]
    D --> D2[Vector Database Internals]

    E --> E1[Time-Series Fundamentals]
    E --> E2[Gorilla & Prometheus]

    F --> F1[Consensus Algorithms]
    F --> F2[Replication & Sharding]
    F --> F3[Compaction Strategies]

    style A fill:#4a90d9,stroke:#2c5f8a,color:#fff
    style B fill:#27ae60,stroke:#1e8449,color:#fff
    style C fill:#f39c12,stroke:#c87f0a,color:#fff
    style D fill:#7b68ee,stroke:#5a4cb5,color:#fff
    style E fill:#e74c3c,stroke:#c0392b,color:#fff
    style F fill:#9b59b6,stroke:#8e44ad,color:#fff
    style A1 fill:#6aacf0,stroke:#4a8ad0,color:#fff
    style A2 fill:#6aacf0,stroke:#4a8ad0,color:#fff
    style A3 fill:#6aacf0,stroke:#4a8ad0,color:#fff
    style B1 fill:#2ecc71,stroke:#27ae60,color:#fff
    style B2 fill:#2ecc71,stroke:#27ae60,color:#fff
    style C1 fill:#f5a962,stroke:#d68e4a,color:#fff
    style C2 fill:#f5a962,stroke:#d68e4a,color:#fff
    style D1 fill:#9a8aee,stroke:#7a6ace,color:#fff
    style D2 fill:#9a8aee,stroke:#7a6ace,color:#fff
    style E1 fill:#e89a9f,stroke:#c87a7f,color:#fff
    style E2 fill:#e89a9f,stroke:#c87a7f,color:#fff
    style F1 fill:#b07cc6,stroke:#9a6cb0,color:#fff
    style F2 fill:#b07cc6,stroke:#9a6cb0,color:#fff
    style F3 fill:#b07cc6,stroke:#9a6cb0,color:#fff
Loading

Table of Contents

1. Database Foundations

  • Section Overview
  • Storage Engines — B-trees vs LSM-trees, write/read/space amplification, and why storage engine choice is the most impactful architecture decision
  • Data Models — Relational, document, key-value, columnar, and graph models with a decision framework for choosing
  • Consistency & Guarantees — ACID vs BASE, CAP theorem limitations, PACELC, isolation levels, and what consistency means in practice

2. Relational Databases

3. Document & Key-Value Databases

  • Section Overview
  • MongoDB Internals — WiredTiger storage engine, replica sets, sharding, aggregation pipeline, schema design patterns, and transactions
  • etcd Internals — Raft consensus implementation, bbolt storage, watch mechanism, and why etcd powers Kubernetes

4. Search & Vector Databases

5. Time-Series Databases

  • Section Overview
  • Time-Series Fundamentals — What makes time-series data different, compression techniques, indexing, retention strategies, and when to use a dedicated TSDB
  • Gorilla & Prometheus — Facebook Gorilla's compression algorithms, Prometheus TSDB architecture, the Grafana ecosystem, and long-term storage options

6. Distributed Database Internals

  • Section Overview
  • Consensus Algorithms — Raft, Paxos, and ZAB — leader election, log replication, safety properties, Multi-Raft scaling, and production implementations
  • Replication & Sharding — Single-leader, multi-leader, and leaderless replication; range vs hash sharding; consistent hashing; rebalancing strategies
  • Compaction Strategies — STCS, LCS, TWCS, ICS, and Gorilla's block-based approach — the write-read-space amplification tradeoff

Who Is This For?

  • Backend Engineers who want to understand the databases they use daily — why queries are slow, how replication works, what compaction does
  • Infrastructure / Platform Engineers operating database clusters who need to reason about sharding, consensus, and compaction tuning
  • Software Engineers evaluating database choices who want to go deeper than feature comparison tables
  • Students & Researchers looking for a structured, opinionated guide through database internals literature

Quick-Start Reading Guide

Your Goal Start Here
Understand how databases store data Storage Engines
Choose between database types Data Models
Reason about consistency tradeoffs Consistency & Guarantees
Debug slow PostgreSQL queries PostgreSQL Internals
Choose the right PostgreSQL index Indexing Strategies
Understand MongoDB architecture MongoDB Internals
Learn how Kubernetes stores state etcd Internals
Build or improve full-text search Elasticsearch Internals
Choose a vector database for RAG Vector Database Internals
Design a monitoring stack Gorilla & Prometheus
Understand distributed consensus Consensus Algorithms
Design a sharding strategy Replication & Sharding
Tune LSM-tree compaction Compaction Strategies

About

Just my notes on databases

Resources

Stars

Watchers

Forks