Cicero Tokenizer

Tokenizer created for my custom LLM model. It's design is based loosely on Byte Pair Encoding (BPE). It's optimized for dictionary creation and tokenization speed. Pre-computed dictionaries are based on Amazon Reviews Dataset.

License

Cicero Tokenizer is under Apache 2.0 license and Common Clause.

Commercial use

If you want to use Cicero Tokenizer commercially, know that, as stated in NOTICE, you are not allowed. If you really want to use it commercially, please contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dictionaries		dictionaries
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
NOTICE.md		NOTICE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cicero Tokenizer

License

Commercial use

About

Uh oh!

Releases

Packages

Languages

License

NN0X/Cicero-Tokenizer

Folders and files

Latest commit

History

Repository files navigation

Cicero Tokenizer

License

Commercial use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages