A large language model pre-training and fine-tuning framework with Infini-attention implementation.
A distributed training framework that incorporates Infini-attention mechanisms, enabling efficient processing of extremely long sequences. The project provides distributed training capabilities for large language models with extended context windows.
- Infini-attention Implementation: Enables "infinite-length" context processing with memory-efficient attention mechanisms
- Distributed Training: Multi-GPU and multi-node training support with tensor, pipeline, and data parallelism
- Model Support: LLaMA model family with Infini-attention modifications
- Flexible Configuration: YAML-based configuration system for different training scenarios
- Memory Optimization: Balance factor optimization for managing memory states in long contexts
export CUDA_DEVICE_MAX_CONNECTIONS=1
torchrun --nproc_per_node=8 run_train.py --config-file fineweb_local_300m_infini_4gpu_config.yamlpython run_generate.py --checkpoint-path /path/to/checkpoint-
Model evaluation can be performed using the lm-evaluation-harness repository.
-
For long context needle-in-a-haystack evaluation (up to 32k):
bash examples/infinite-context-length/scripts/run_evals.sh [depth_percent]The project includes various configuration files for different training scenarios:
fineweb_local_*_infini_*gpu_config.yaml: Infini-attention training configspasskey_finetune_*_optimized_infini_config.yaml: Fine-tuning for long context tasks
src/nanotron/: Core framework implementationexamples/infinite-context-length/: Infini-attention specific examples and needle-in-a-haystack evaluationsscripts/: Analysis and utility scripts for balance factors and memory content
Licensed under the Apache License, Version 2.0.