Libremodel I is an attempt to democratize LLM creation. We begin by training on NO copyrighted material, this means users are free to use Libremodel in any location in any field with no chance of legal action. The training itself is as affordable as can be, with the initial model training costing $500 USD or less. This repo is for the foundational model with no fine-tuning or post-training of any kind.
Libremodel I is codenamed Gigi for the initial 2 datasets, Gutenberg & GOvreports. Training is comprised of 3 phases. The first phase 0-8% was 70% Project Gutenberg and 30% Govreports, at 8% phase 2 began, it was 50% Gutenberg, 45% Wikipedia, and 5% Givreports. Phase three began at 20% and is 40% Gutenberg, 30% Wiki, and 30% Harvard legal records from the COLD database. We intend to give the model higher reasoning but the structured curriculum, and the addition of legal cases make it a great candidate for legal applications.
Releases are released every 25% for users to play around with the base model. https://github.com/openconstruct/libremodel/releases CUrrently 25% is the highest release