These annotated snippets show how to run the three console scripts once you have
installed t81lib[torch] (or . [torch]) so the t81-convert, t81-gguf, and
t81-qat entry points land in your PATH.
pipx install .[torch]If you prefer a shared environment:
pip install t81lib[torch]pipx installs the scripts under ~/.local/bin (or pipx’s bin directory); make
sure that directory is on your shell PATH.
t81-convert meta-llama/Llama-3.2-3B-Instruct \
/tmp/converted-llama3.2-3b --quant TQ1_0 \
--torch-dtype bfloat16 --force-cpu-device-map--quant TQ1_0forces the ternary quantization lookup we ship with the runtimes.--torch-dtype bfloat16keeps FP16 buffers while quantizing.--force-cpu-device-mapensures Accelerate pins tensors to CPU so the newt81_metadata.jsonstays serializable.- The command rewrites every
nn.Lineartot81.nn.Linearbefore producing the converted directory with ternary tensors,t81_metadata.json, and the stats log.
To emit a GGUF bundle in the same run, add --output-gguf --gguf-quant TQ1_0
while pointing --output-dir (or the last positional argument) at where you
want the converted directory to stay.
t81-gguf \
converted-llama3.2-3b/out.3.t81.gguf \
--from-t81 /tmp/converted-llama3.2-3b \
--quant TQ2_0 --device-map none --force-cpu-device-map--from-t81reuses the metadata + helpers already stored in the converted directory so you can skip re-converting the HF checkpoint.--device-map nonekeeps everything on the host, and--force-cpu-device-mapmakes the CLI idempotent on macOS/Metal devices with limited RAM.- The resulting
out.3.t81.ggufbundle works withllama.cpp, Ollama, or LM Studio.
If you need to re-run a conversion from scratch before exporting, swap
--from-t81 for --from-hf meta-llama/... and reuse the same threshold, dtype,
and force-cpu knobs noted above.
t81-qat gpt2 \
--dataset-name wikitext \
--output-dir ~/ternary-gpt2 \
--per-device-train-batch-size 4 \
--learning-rate 5e-5 \
--max-train-samples 1000 \
--ternary-threshold 0.45 \
--ternary-stochastic-rounding \
--ternary-warmup-steps 500- The CLI mirrors the
t81.trainerhelpers, so you can sweep the ternary threshold, stochastic rounding, and warmup steps just like in the Python API. - Install
datasets+transformersalongsidetorch;t81-qatshows the missing-dependency message when those extras are unavailable. - Training snapshots and logs land under
~/ternary-gpt2so you can later convert them witht81-convert.