Byte Pair Encoding (BPE) Tokenizer

This project lets you:

How to use it

Download all files
Add all training text data to source.txt
Run train.py
Once the token data is stored in bpe_merges.json, the text in source.txt can be deleted and replaced with any new text to tokenize
Run use.py to see what the different token data visualization methods do

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bpe.py		bpe.py
source.txt		source.txt
train.py		train.py
use.py		use.py