Skip to content

walkerbrown/latinum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LATINVM

A fully offline Latin keyboard with word completion and next word prediction, built on n-gram frequency data extracted from a classical Latin corpus.

Features

  • Privacy preserving β€” no Full Access entitlement, no data collection, all processing on-device
  • Word completion β€” frequency-ranked suggestions as you type (binary search on 50k-word list)
  • Next word prediction β€” trigram β†’ bigram β†’ unigram fallback chain
  • Macron & ligature input β€” long-press vowels for ā Δ“ Δ« ō Ε«, or Γ¦ Ε“
  • Diacritic preservation β€” user-typed macrons carry through into completions
  • Suggestions on highlight β€” select a word to see alternative completions

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  KeyboardViewController                               β”‚
β”‚    β”œβ”€ input handling, shift/caps, auto-capitalization β”‚
β”‚    └─ word-highlight trigger (selected text)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  PredictionEngine                                     β”‚
β”‚    β”œβ”€ merges & deduplicates across sources            β”‚
β”‚    └─ macron/ligature preservation                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  PredictionSource chain (queried in order):           β”‚
β”‚    1. FrequencyCompletionSource  (prefix completion)  β”‚
β”‚    2. NGramPredictionSource      (next word)          β”‚
β”‚    3. FallbackPredictionSource   (fallback 170 words) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data files (bundled as JSON, ~3.3 MB combined):

File Contents Lookup
word_frequencies.json 50k words sorted by corpus frequency Binary search on alphabetically sorted array
ngrams.json 94k unigrams, 50k bigrams, 30k trigrams Dictionary indexed by preceding word(s)

Runtime memory footprint: ~8–14 MB, well within the iOS keyboard extension limit of approximately 30 MB.

Project Structure

latinum/
β”œβ”€β”€ data_pipeline/
β”‚   β”œβ”€β”€ clean_corpus.py       # Corpus cleaning & sentence segmentation
β”‚   β”œβ”€β”€ normalization.py      # Macron/ligature mappings (canonical source)
β”‚   └── extract_ngrams.py     # Generates word_frequencies.json & ngrams.json
β”œβ”€β”€ iOS/
β”‚   β”œβ”€β”€ Latinum/              # Host app (setup instructions)
β”‚   β”œβ”€β”€ LatinumKeyboard/      # Keyboard extension
β”‚   β”‚   β”œβ”€β”€ KeyboardViewController.swift
β”‚   β”‚   β”œβ”€β”€ KeyboardView.swift
β”‚   β”‚   β”œβ”€β”€ KeyboardFeedback.swift
β”‚   β”‚   β”œβ”€β”€ DiacriticMenuView.swift
β”‚   β”‚   β”œβ”€β”€ PredictionEngine.swift
β”‚   β”‚   β”œβ”€β”€ FrequencyCompletionSource.swift
β”‚   β”‚   β”œβ”€β”€ NGramPredictionSource.swift
β”‚   β”‚   β”œβ”€β”€ LatinNormalization.swift
β”‚   β”‚   └── Resources/        # word_frequencies.json, ngrams.json, key-down.wav
β”‚   β”œβ”€β”€ LatinumTests/         # Unit tests
β”‚   └── project.yml           # XcodeGen project spec
└── latincorpus.txt           # Raw Latin corpus (~29 MB)

Requirements

  • macOS 14+, Xcode 15+
  • iOS 17+ deployment target
  • Python 3.10+ (data pipeline only)
  • XcodeGen (brew install xcodegen)

Quick Start

# 1. Generate prediction data from the corpus
python3 data_pipeline/extract_ngrams.py

# 2. Generate the Xcode project
cd iOS && xcodegen generate

# 3. Build & run
open Latinum.xcodeproj   # Build to device, then:
# Settings β†’ General β†’ Keyboard β†’ Keyboards β†’ Add "Latinum"

Running Tests

# iOS (Xcode)
xcodebuild -project iOS/Latinum.xcodeproj -scheme Latinum -destination 'platform=iOS Simulator,name=iPhone 17 Pro' test

# Python
python3 data_pipeline/normalization.py

License

Copyright 2026 Dylan Walker Brown
Licensed under the Apache License, Version 2.0.

The works in the Latin corpus are in the public domain.
Thanks to William L. Carey for making these works available at The Latin Library.
Thanks to Mathis Van Eetvelde for compiling these works into latincorpus.txt.

About

A predictive Latin keyboard for iOS πŸ›οΈ

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors