AI Learning Gems

Textbook-style chapters on technical/mathematical topics, generated using LLM agents and rendered with Quarto.

How It Works

This project uses agentic LLM prompts to generate high-quality educational content. The prompts are designed for:

Cursor (prompts in .cursor/commands/)
Windsurf (prompts in .agent/workflows/)

Recommended model: Claude Opus 4.5 — works well for generating both prose and D2 diagrams.

Project Structure

AI-Learning-Gems/
├── _quarto.yml                  # Global config (kernel, format)
├── _extensions/pandoc-ext/      # D2 diagram filter
├── .cursor/commands/            # Cursor agent prompts
├── .agent/workflows/            # Windsurf agent prompts
├── scripts/                     # Source extraction tools
│   ├── authenticated_extract.py # Login-gated / JS-heavy pages → MD + images
│   ├── setup_browser_profile.py # One-time login to create browser profiles
│   ├── webpage_to_md.py         # Static pages → MD + images
│   ├── mistral_ocr.py           # PDF → MD + images (via Mistral API)
│   └── .browser-profiles/       # Saved browser sessions (gitignored)
├── sources/                     # Downloaded sources (gitignored)
├── Topic-1/                     # Each topic is a folder with .qmd chapters
├── Topic-2/
└── ...

Setup

Prerequisites (Homebrew — one-time global install)

These tools cannot be installed inside conda and must be installed system-wide:

brew install --cask quarto    # Document rendering (.qmd → HTML/PDF)
brew install d2               # Diagram generation
brew install pandoc            # Document conversion
brew install imagemagick       # PDF figure → PNG conversion
brew install ghostscript       # Required by ImageMagick for PDF rasterization

# Pin versions to prevent brew auto-upgrading them
brew pin d2 pandoc imagemagick ghostscript

Verify versions (must match at least the same minor version):

quarto --version    # 1.7.x (tested with 1.7.31)
d2 --version        # 0.7.x (tested with 0.7.1)
pandoc --version    # 3.8.x (tested with 3.8.3)
magick --version    # 7.1.x (tested with 7.1.2)
gs --version        # 10.x  (tested with 10.06)

Note: Homebrew does not support installing specific versions (brew install d2@0.7.1 won't work). If brew install gives you a newer version than listed above, it will likely work. The brew pin commands prevent brew upgrade from bumping these during routine updates.

Python Environment (conda + uv)

Create the project's conda environment and install all dependencies. These commands are idempotent — safe to re-run.

# Create conda env (skips if already exists)
conda create -n ai-learning-gems python=3.13 --yes 2>/dev/null || true
conda activate ai-learning-gems

# Install uv for fast pip installs (skips if already installed)
pip install uv 2>/dev/null || true

# Install all Python dependencies
cd /path/to/AI-Learning-Gems
uv pip install -r requirements.txt

# Register as Jupyter kernel (used by Quarto for .qmd code execution)
python -m ipykernel install --user --name=ai-learning-gems --display-name="Python (ai-learning-gems)"

Verify:

conda activate ai-learning-gems
python --version                                    # 3.13.x
python -c "import numpy, pandas, matplotlib, scipy, crawl4ai; print('OK')"
jupyter kernelspec list | grep ai-learning-gems     # Should show the kernel
quarto check jupyter                                # Should show ai-learning-gems in kernel list

Crawl4AI Browser Setup (for authenticated web extraction)

After installing requirements, set up Playwright browsers for the web extraction tool:

conda activate ai-learning-gems
crawl4ai-setup    # Downloads Chromium (~90MB, one-time)

Then create browser profiles for login-gated sites (one-time per site):

cd /path/to/AI-Learning-Gems

# Substack
python scripts/setup_browser_profile.py "https://substack.com/sign-in" substack
# → A Chromium window opens → log in → press Enter in terminal to save

# Medium
python scripts/setup_browser_profile.py "https://medium.com/m/signin" medium
# → Same flow: log in → press Enter

Profiles are saved to scripts/.browser-profiles/ (gitignored — they contain session cookies). Re-run if sessions expire or you get empty output.

Quarto Extension (D2 Diagrams)

The pandoc-ext/diagram extension is already committed in _extensions/. To reinstall:

cd /path/to/AI-Learning-Gems
quarto add pandoc-ext/diagram

VS Code / Cursor Extensions

Install:

Quarto (quarto.quarto)
Python (ms-python.python)

Source Extraction Tools

The scripts/ folder contains tools for downloading web sources as clean Markdown with local images. See .cursor/rules/web-source-fetching.md for the full decision tree.

Authenticated / JS-Heavy Pages → Markdown + Images

Uses Crawl4AI with persistent browser profiles.

# Substack article (auto-derives output path from URL)
python scripts/authenticated_extract.py "https://substack.com/home/post/p-189051354" --profile substack
# → sources/substack.com/home/post/p-189051354/content.md + images/

# Public blog that needs JS rendering (no profile needed)
python scripts/authenticated_extract.py "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/"
# → sources/lilianweng.github.io/posts/2024-11-28-reward-hacking/content.md + images/

# Custom CSS selector for unknown sites
python scripts/authenticated_extract.py "https://example.com/page" -s "article"

# Skip images
python scripts/authenticated_extract.py "https://example.com/page" --no-images

Static Web Pages (public, no JS needed)

python scripts/webpage_to_md.py "https://d2l.ai/chapter_.../section.html" -o sources/d2l.ai/chapter_.../

PDF → Markdown (via Mistral OCR)

Requires MISTRAL_API_KEY in .env.

python scripts/mistral_ocr.py document.pdf -o sources/output/

ArXiv Papers (LaTeX source preferred)

mkdir -p sources/arxiv-2010.11929 && cd sources/arxiv-2010.11929
curl -sL "https://arxiv.org/src/2010.11929" -o source.tar.gz && tar -xzf source.tar.gz

PDF Figure → PNG Conversion

# Single figure (400 DPI, auto-trimmed)
magick -density 400 figure.pdf -trim +repage figure.png

# Batch convert all PDF figures in an arXiv source
find "sources/arxiv-{ID}/" \( -name '*.pdf' \) \( -path '*/images/*' -o -path '*/figs/*' -o -path '*/figures/*' \) | while read f; do
  outfile="${f%.pdf}.png"
  [ ! -f "$outfile" ] && magick -density 400 "$f" -trim +repage "$outfile" && echo "Converted: $f"
done

Rendering

From Cursor/VS Code

Open a .qmd index file
Click Preview (Quarto extension) or use Command Palette: Quarto: Preview

From Terminal

conda activate ai-learning-gems
cd /path/to/AI-Learning-Gems

# Preview with live reload
quarto preview Statistics/Your-Chapter.qmd

# One-time render
quarto render Statistics/Your-Chapter.qmd --to html

Troubleshooting

Issue	Fix
D2 diagrams show as plain text	Check the `.qmd` has `filters:` in YAML header
Wrong Jupyter kernel	Run `python -m ipykernel install --user --name=ai-learning-gems --display-name="Python (ai-learning-gems)"`
Extension not found	Run `quarto render` from the project root, not a subdirectory
Authenticated extraction returns login page	Re-run `setup_browser_profile.py` — session may have expired
`crawl4ai-setup` hangs	It downloads ~90MB Chromium; wait or check network
Quarto can't find kernel	Run `conda activate ai-learning-gems && jupyter kernelspec list` to verify
D2 command not found	`brew install d2` (cannot be installed via pip/conda)
PyMC installation fails	`conda install -c conda-forge pymc arviz`

Environment Reference

Developed and tested with:

Python 3.13.12 (via conda)
Quarto 1.7.31 (via Homebrew)
Pandoc 3.8.3 (via Homebrew)
D2 0.7.1 (via Homebrew)
ImageMagick 7.1.2 + Ghostscript 10.06 (via Homebrew)
macOS arm64 (Apple Silicon)

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.agent		.agent
.cursor		.cursor
.github/workflows		.github/workflows
AI Research Writing		AI Research Writing
Agentic Systems		Agentic Systems
Deep-Learning		Deep-Learning
Learning Science		Learning Science
Post-Training LLMs		Post-Training LLMs
Search and Ads		Search and Ads
Steering LLMs		Steering LLMs
Transformers		Transformers
_extensions		_extensions
scripts		scripts
sources		sources
.env.template		.env.template
.gitignore		.gitignore
AI-Learning-Gems.code-workspace		AI-Learning-Gems.code-workspace
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
custom.scss		custom.scss
index.qmd		index.qmd
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Learning Gems

How It Works

Project Structure

Setup

Prerequisites (Homebrew — one-time global install)

Python Environment (conda + uv)

Crawl4AI Browser Setup (for authenticated web extraction)

Quarto Extension (D2 Diagrams)

VS Code / Cursor Extensions

Source Extraction Tools

Authenticated / JS-Heavy Pages → Markdown + Images

Static Web Pages (public, no JS needed)

PDF → Markdown (via Mistral OCR)

ArXiv Papers (LaTeX source preferred)

PDF Figure → PNG Conversion

Rendering

From Cursor/VS Code

From Terminal

Troubleshooting

Environment Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Learning Gems

How It Works

Project Structure

Setup

Prerequisites (Homebrew — one-time global install)

Python Environment (conda + uv)

Crawl4AI Browser Setup (for authenticated web extraction)

Quarto Extension (D2 Diagrams)

VS Code / Cursor Extensions

Source Extraction Tools

Authenticated / JS-Heavy Pages → Markdown + Images

Static Web Pages (public, no JS needed)

PDF → Markdown (via Mistral OCR)

ArXiv Papers (LaTeX source preferred)

PDF Figure → PNG Conversion

Rendering

From Cursor/VS Code

From Terminal

Troubleshooting

Environment Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages