Working implementation of a transformer language model, built from scratch following Build a Large Language Model From Scratch by Sebastian Raschka.
src/— Core implementation (tokenizer, attention, transformer blocks, training loop)notebooks/— Exploratory work and chapter exercisesdata/— Training data (gitignored if large)tests/— Unit tests for core components
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtDeveloped on Apple Silicon (MPS backend). Cross-architecture experiments on NVIDIA (CUDA) and AMD (ROCm) GPUs documented separately.