A fully offline Latin keyboard with word completion and next word prediction, built on n-gram frequency data extracted from a classical Latin corpus.
- Privacy preserving β no
Full Accessentitlement, no data collection, all processing on-device - Word completion β frequency-ranked suggestions as you type (binary search on 50k-word list)
- Next word prediction β trigram β bigram β unigram fallback chain
- Macron & ligature input β long-press vowels for Δ Δ Δ« Ε Ε«, or Γ¦ Ε
- Diacritic preservation β user-typed macrons carry through into completions
- Suggestions on highlight β select a word to see alternative completions
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KeyboardViewController β
β ββ input handling, shift/caps, auto-capitalization β
β ββ word-highlight trigger (selected text) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PredictionEngine β
β ββ merges & deduplicates across sources β
β ββ macron/ligature preservation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PredictionSource chain (queried in order): β
β 1. FrequencyCompletionSource (prefix completion) β
β 2. NGramPredictionSource (next word) β
β 3. FallbackPredictionSource (fallback 170 words) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data files (bundled as JSON, ~3.3 MB combined):
| File | Contents | Lookup |
|---|---|---|
word_frequencies.json |
50k words sorted by corpus frequency | Binary search on alphabetically sorted array |
ngrams.json |
94k unigrams, 50k bigrams, 30k trigrams | Dictionary indexed by preceding word(s) |
Runtime memory footprint: ~8β14 MB, well within the iOS keyboard extension limit of approximately 30 MB.
latinum/
βββ data_pipeline/
β βββ clean_corpus.py # Corpus cleaning & sentence segmentation
β βββ normalization.py # Macron/ligature mappings (canonical source)
β βββ extract_ngrams.py # Generates word_frequencies.json & ngrams.json
βββ iOS/
β βββ Latinum/ # Host app (setup instructions)
β βββ LatinumKeyboard/ # Keyboard extension
β β βββ KeyboardViewController.swift
β β βββ KeyboardView.swift
β β βββ KeyboardFeedback.swift
β β βββ DiacriticMenuView.swift
β β βββ PredictionEngine.swift
β β βββ FrequencyCompletionSource.swift
β β βββ NGramPredictionSource.swift
β β βββ LatinNormalization.swift
β β βββ Resources/ # word_frequencies.json, ngrams.json, key-down.wav
β βββ LatinumTests/ # Unit tests
β βββ project.yml # XcodeGen project spec
βββ latincorpus.txt # Raw Latin corpus (~29 MB)
- macOS 14+, Xcode 15+
- iOS 17+ deployment target
- Python 3.10+ (data pipeline only)
- XcodeGen (
brew install xcodegen)
# 1. Generate prediction data from the corpus
python3 data_pipeline/extract_ngrams.py
# 2. Generate the Xcode project
cd iOS && xcodegen generate
# 3. Build & run
open Latinum.xcodeproj # Build to device, then:
# Settings β General β Keyboard β Keyboards β Add "Latinum"# iOS (Xcode)
xcodebuild -project iOS/Latinum.xcodeproj -scheme Latinum -destination 'platform=iOS Simulator,name=iPhone 17 Pro' test
# Python
python3 data_pipeline/normalization.pyCopyright 2026 Dylan Walker Brown
Licensed under the Apache License, Version 2.0.
The works in the Latin corpus are in the public domain.
Thanks to William L. Carey for making these works available at The Latin Library.
Thanks to Mathis Van Eetvelde for compiling these works into latincorpus.txt.