This documentation is for developers who want to make complex changes to AnkiLangs. For simple contributions like fixing translations, see CONTRIBUTING.md instead.
- Setup
- Project Structure
- Data Flow
- Working with Data
- Systematic Deck Review
- Code Quality
- Testing
- Learning Hints
All platforms require:
- Python 3.10+ - Programming language
- uv - Fast Python package installer and project manager
- Anki - Flashcard application (for testing decks)
- CrowdAnki add-on - Anki add-on for importing/exporting decks
- Just (optional) - Task runner for simplified commands
- ffmpeg (optional) - Required only for systematic deck reviews
# Install system dependencies
sudo apt update
sudo apt install python3 python3-pip git ffmpeg
# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install Just (task runner) - optional but recommended
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to ~/.local/bin
# Install Anki
# Download from https://apps.ankiweb.net/#download
# Clone repository
git clone https://github.com/ankilangs/ankilangs
cd ankilangs/
# Install Python dependencies. Done automatically by uv when running commands or run:
uv syncFollow analogous instructions to the ones for Ubuntu/Debian above.
# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install system dependencies
brew install python git ffmpeg just
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install Anki
# Download from https://apps.ankiweb.net/#download
# Or use Homebrew:
brew install --cask anki
# Clone repository
git clone https://github.com/ankilangs/ankilangs
cd ankilangs/
# Note: On macOS, you may need to delete .DS_Store files before building:
find src/media -name '.DS_Store' -deleteRecommended: Use Windows Subsystem for Linux (WSL) for the best development experience on Windows.
-
Open PowerShell as Administrator and run:
wsl --install -
Restart your computer when prompted
-
After restart, Ubuntu will open automatically. Create a username and password.
Once inside WSL (Ubuntu), follow the Linux setup instructions from above:
Download and install Anki for Windows from https://apps.ankiweb.net/#download
Note: Anki runs on Windows, but development tools run in WSL. You'll build decks in WSL and import them into Anki on Windows.
Regardless of your platform:
- Open Anki
- Go to Tools → Add-ons → Get Add-ons
- Enter code:
1788670778 - Click OK and restart Anki
Documentation: https://docs.ankiweb.net/addons.html
ankilangs/
├── src/ # Source files (edit these)
│ ├── data/ # CSV files - source of truth for vocabulary
│ │ ├── 625_words-base-*.csv # Base vocabulary for each language
│ │ ├── 625_words-pair-*.csv # Translation pairs and hints
│ │ ├── minimal_pairs-*.csv # Minimal pair exercises
│ │ └── generated/ # Auto-generated derived files (don't edit)
│ │
│ ├── note_models/ # Anki note type definitions (YAML + HTML)
│ │ ├── EN_to_ES_625_Words/ # Note model for EN→ES deck
│ │ └── ... # One directory per language pair
│ │
│ ├── headers/ # Deck descriptions and metadata
│ │ ├── description_en_to_es.md # Deck description (appears in Anki)
│ │ └── ...
│ │
│ ├── deck_content/ # Additional deck content
│ │ ├── EN_to_ES_625_Words/ # Changelogs, screenshots per deck
│ │ └── ...
│ │
│ └── media/ # Media files
│ ├── audio/ # Audio recordings by language
│ │ ├── en_US/ # English audio files
│ │ ├── es_ES/ # Spanish audio files
│ │ └── ...
│ └── images/ # Shared images
│
├── build/ # Generated Anki decks
│ ├── EN_to_ES_625_Words/ # Built deck ready for Anki import
│ └── review/ # Generated review files (xlsx + mp3)
│
├── recipes/ # Brainbrew build recipes (YAML config)
│ ├── source_to_anki_625_words.yaml # Recipe for 625 word decks
│ └── source_to_anki_minimal_pairs.yaml # Recipe for minimal pairs
│
├── al_tools/ # Python CLI tools
│ ├── cli.py # Main CLI entry point
│ ├── core.py # Core data processing logic
│ └── ...
│
├── tests/ # Test suite
│ ├── core/ # Core functionality tests
│ └── content/ # Content validation tests
│
├── docs/ # Documentation
│ ├── development.md # This file
│ ├── learning-hints.md # Learning hints guide
│ └── adr-*.md # Architecture decision records
│
├── website/ # AnkiLangs.org website (Hugo)
│
├── data.db # SQLite cache (git-ignored, regenerable)
├── pyproject.toml # Python project configuration
├── uv.lock # Locked dependency versions
├── Justfile # Task runner commands
└── README.md # Project overview
src/data/ - The single source of truth. All vocabulary, translations, IPA, audio references, and hints live here as CSV files. Everything else is derived from these files.
src/note_models/ - Defines how Anki cards look and behave. Each language pair has its own note model with HTML templates, CSS styling, and card type definitions.
src/media/ - All audio files and images. Audio files are named systematically (e.g., al_es_es_the_house.mp3 for Spanish "la casa").
build/ - Generated output. After running build commands, this contains importable Anki decks (via the CrowdAnki plugin).
recipes/ - Configuration files that tell Brainbrew how to transform source files into CrowdAnki Anki decks.
al_tools/ - Command-line tools for data manipulation, validation, audio generation, and more. Invoked via uv run al-tools <command>.
data.db - SQLite database cache for efficient data operations. Automatically regenerated from CSV files. Git-ignored because it's derivable.
This diagram shows how data moves from source files to Anki decks:
EDITING WORKFLOW
┌─────────────────────────────┐
│ │
▼ │
┌─────────────┐ al-tools ┌─────────────┐
│ CSV files │ csv2sqlite │ SQLite │
│ (src/data/) │ ───────────────→│ (data.db) │
│ │ ←───────────────│ │
└──────┬──────┘ al-tools └─────────────┘
│ sqlite2csv Edit with SQL
│ tools (DB Browser,
│ sqlite3, etc.)
│
┌─────────┴─────────┐
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Note models │ │ al-tools │
│ + media │ │ generate │
│ + recipes │ └──────┬──────┘
└──────┬──────┘ │
│ ▼
│ ┌─────────────┐
│ │ Generated │ (joins license info
│ │ CSVs │ into single fields)
│ └──────┬──────┘
│ │
└────────┬──────────┘
│
▼
┌─────────────┐
│ Brainbrew │ Transforms sources into
│ │ CrowdAnki JSON format
└──────┬──────┘
│
▼
┌─────────────┐
│ build/ │ CrowdAnki-compatible
│ │ deck directories
└──────┬──────┘
│
▼ CrowdAnki plugin (File → Import from disk)
┌─────────────┐
│ Anki │
└─────────────┘
| Tool | Command | Purpose |
|---|---|---|
| al-tools csv2sqlite | just csv2sqlite |
Import CSV → SQLite for editing |
| al-tools sqlite2csv | just sqlite2csv |
Export SQLite → CSV after editing |
| al-tools generate | (part of just build) |
Create derived CSVs (license field joins) |
| al-tools check | just check-data |
Validate data, find missing hints |
| Brainbrew | just build |
Transform sources → CrowdAnki format |
| CrowdAnki | Anki menu | Import build/ directories into Anki |
- CSV files are the source of truth — they're versioned in git
- SQLite is a convenience layer — easier to query/edit than CSV, but not versioned
- Generated CSVs are derived — don't edit them; they're recreated on each build
- build/ is output only — import into Anki, don't edit directly
The project uses CSV files as the source of truth, but tools work with a SQLite database for efficiency.
Import CSV files into SQLite:
uv run al-tools csv2sqlite -i src/data
# Or: just csv2sqliteThis creates data.db (which is git-ignored).
-
Import CSV to SQLite (if not done already):
just csv2sqlite
-
Edit data: Use SQL tools (sqlite3, DB Browser, DBeaver, etc.) to query/update
data.dbExample queries are in CLAUDE.md.
-
Export back to CSV:
just sqlite2csv
-
Commit changes:
git commit -m "feat: add Spanish audio for food vocabulary"
Safety features:
- Database auto-creates from CSV if missing
- Prompts with options if CSV files are newer than database (overwrite/ignore/cancel)
To test your changes in Anki:
# Check for data issues
just check-data
# Build all decks (includes sqlite2csv + generate)
just build- Open Anki
- Go to File → CrowdAnki: Import from disk
- Select a directory from
build/(e.g.,build/EN_to_ES_625_Words) - Review the deck like any other Anki deck
Generate a spreadsheet and combined audio file for comprehensive deck review:
uv run al-tools export-review -s en_us -t fr_frThis creates:
build/review/review_en_us_to_fr_fr.xlsx- Excel spreadsheet with all entriesbuild/review/review_en_us_to_fr_fr.mp3- Combined audio file
Open the files:
# Linux
libreoffice build/review/review_en_us_to_fr_fr.xlsx &
vlc build/review/review_en_us_to_fr_fr.mp3
# macOS
open build/review/review_en_us_to_fr_fr.xlsx
open build/review/review_en_us_to_fr_fr.mp3
# Windows (in WSL, access files from Windows)
explorer.exe build/review/Requires: ffmpeg (for audio concatenation)
For detailed instructions on how to use these files, see CONTRIBUTING.md.
This project uses Ruff for linting and formatting Python code.
Before committing code:
just check-code # Format + lint
just test # Run testsOr run all checks (code + data):
just checkRun the test suite:
uv run pytest
# Or: just testFollow these guidelines when writing or modifying tests. See ADR-005 for the full rationale.
Test observable behavior through the public interface. Call the same functions that users/callers use (e.g., csv2sqlite(), generate_audio()). Assert on observable outputs — files on disk, database state, return values — not on how the code internally arrived there. Tests that rely on implementation details break on every refactor.
Do not test private functions. Functions prefixed with _ are implementation details tested implicitly through the public functions that use them. Only test a private function directly when there is genuinely no practical way to cover the behavior through the public interface.
Prefer testing at the highest practical level. When possible, exercise a meaningful workflow end-to-end rather than testing small pieces in isolation. For example, test the full CSV→SQLite→CSV round-trip rather than individual import/export helpers.
Minimize mocking. Use real SQLite databases, real filesystems (tmp_path), and real CSV parsing — they're fast and deterministic. For external services (Google Cloud TTS, subprocess calls to git/gh), use simple fakes or dependency injection (tts_client parameter) rather than unittest.mock.patch on internal functions. Fakes for external boundaries live in tests/fakes.py.
Use testdata_dir and golden_dir fixtures (from tests/conftest.py):
testdata_dir: Checked-in input files for a test. Use when input data is easier to maintain as files than inline strings (e.g., CSV files, YAML configs).golden_dir: Checked-in expected output files. Use for complex or multi-line output that would clutter the test. Auto-updatable with--update-golden.
Both fixtures resolve to a directory scoped to the current test function (e.g., tests/core/testdata/test_csv_roundtrip/test_roundtrip_preserves_all_files/).
If you intentionally change test output:
just test-update-goldenLearning hints disambiguate words with multiple meanings (e.g., "light" = brightness vs. weight). There are four hint types: pronunciation, reading, spelling, and listening.
For complete documentation with examples, see docs/learning-hints.md.
English "race" has two meanings:
- Race (competition) → German "das Rennen"
- Race (ethnicity) → German "die Rasse"
Add a pronunciation hint to clarify:
UPDATE translation_pair
SET pronunciation_hint = 'competition'
WHERE key = 'race [sport]'
AND source_locale = 'en_us'
AND target_locale = 'de_de';just check-dataThis detects potentially ambiguous words missing hints.
To create a new 625-word deck:
uv run al-tools create-deck <source_locale> <target_locale>Example:
uv run al-tools create-deck it_it pt_pt # Italian → PortugueseThis automatically:
- Creates note model directory with templates
- Generates localized card types
- Creates deck description and CSV files
- Updates build recipes and deck registry
After creating, add vocabulary data to the generated CSV file.
Requires Google Cloud account with authentication. See Google Cloud docs.
# Import CSV to SQLite first
just csv2sqlite
# Generate audio for a language
uv run al-tools audio -l es_es
# Export updated database
just sqlite2csvThis project follows Conventional Commits:
<type>[optional scope]: <description>
[optional body]
[optional footer]
feat:- New featurefix:- Bug fixrefactor:- Code change that neither fixes a bug nor adds a featuredocs:- Documentation changestest:- Adding or updating testschore:- Maintenance tasks, dependency updatesstyle:- Code style changes (formatting)
feat: add pronunciation hints for Spanish bank/bench ambiguity
fix: correct IPA transcription for German "ich"
docs: update SQLite query examples in CLAUDE.md
The Justfile is a modern replacement of the Makefile. View all commands:
justNote: Just is optional. All commands can be run directly with uv run.
Releases are mostly automated using al-tools release commands.
- Deck must be registered in
decks.yaml - Clean working directory (no uncommitted changes)
ghCLI tool installed and authenticated (for finalization)
-
Update changelog with new version entry:
vim src/deck_content/en_to_es_625/changelog.md
Add entry like:
## 1.0.0 - 2026-02-10 - Complete audio and IPA for all words - Complete hints for ambiguous words
-
Update description if necessary e.g.
src/deck_content/en_to_es_625/description.md -
Commit
-
Run release automation (validates, updates versions, creates commits/tag):
al-tools release en_to_es_625 --version 1.0.0
The command will print next steps (Anki export instructions and finalize command).
-
Finalize release (creates GitHub release, generates AnkiWeb description):
al-tools release en_to_es_625 --finalize ~/Downloads/Spanish.EN.to.ES.-.625.Words.-.AnkiLangs.org.-.v1.0.0.apkgThe command will print next steps and AnkiWeb publication info (title, tags, description) for easy copy-paste.
Validate without making changes:
al-tools release en_to_es_625 --version 1.0.0 --dry-run# Files from specific commit
cvlc `git diff --name-only --diff-filter=d c6d7af^ src/media/audio/`
# Uncommitted changes
cvlc `git diff --name-only --diff-filter=d src/media/audio/`Requires VLC (tested on Linux).
For significant structural changes, please open an issue first to discuss. This avoids wasted effort if the approach doesn't fit the project.
- ADR-001: SQLite Cache - Database schema and design
- ADR-002: Audio Filenames - Audio file naming conventions
- ADR-003: Sentences - Sentences for vocabulary reinforcement
- ADR-004: Replace BrainBrew - Direct SQLite to CrowdAnki export
- ADR-005: Testing Strategy - Testing strategy
- ADR-006: I18n CSV - CSV-based internationalization
- Learning Hints Guide - Complete guide to using hints