GitHub - icip-cas/AutoAlign: A toolkit for automated alignment research.

🔥Quick Start • 📪Features • 📈Results • 🐛Issues • 📜Citation • 📄Docs

📣 About

Auto-Alignment is a package focusing on scalable and automated alignment methods. We aim to provide the academic community with a series of classic alignment baselines and ready-to-use automated alignment algorithms. This toolkit is designed to facilitate research in the field of LLM alignment.

The core functionalities of the toolkit include:

Implementation of common alignment operators (e.g., SFT, RM, DPO, GRPO, etc.)
Implementation of various automatic model alignment algorithms (e.g., CAI, SPIN, RLCD, etc.)
Efficient model sampling
Automated model evaluation
After training intervention methods (e.g., Model Averaging, etc.)

This toolkit can seamlessly integrate with popular resource scheduling systems like Slurm.

🚀 News

[2025.7.25] Liger Kernel for Supervised Fine-Tuning (SFT) has been integrated! 20%-25% faster than the previous version.

[2025.6.19] We've integrated comprehensive safety evaluation datasets and released the Megatron implementation for large-scale distributed training.

[2025.5.23] AutoAlign has been accepted for presentation at the ACL 2025 Demo Track!

[2025.1.1] AutoAlign now supports Self-Rewarding and Constitutional AI.

[2024.8.23] We are excited to announce the release of AutoAlign v0.0.1! In this first version, we have implemented a variety of alignment operators, including SFT, RM, and DPO, along with user-friendly evaluation systems and several auto-alignment algorithms (CAI, PCON, and a variant of RLCD), with more algorithms on the way. Stay tuned for future updates! 🔥🔥🔥

🔥 Quick Start

🔨 Environment Setup

Default

conda create -n ata python=3.10
conda activate ata
pip install .[train]
# alternatively, use uv for improved installation speed
# pip install uv
# uv pip install .[train]

Evaluation (Optional)

conda create -n ata_eval --clone ata
conda activate ata_eval
pip install .[eval]
bash ./scripts/post_install.sh

📂 Data

To facilitate the community with out-of-box alignment data. We publicly release a collection of SFT data here. This toolkit currently utilizes the format in data/dummy_sft.json for supervised fine-tuning and the format in data/dummy_dpo.json for the reinforcement learning process. Please refer to these files for more details.

📚 Basic Alignment Operators

SFT

autoalign-cli sft \
            --model_name_or_path "Qwen/Qwen2.5-7B" \
            --data_path "data/dummy_sft.json" \
            --bf16 True \
            --output_dir "models/qwen2-7b-sft" \
            --model_max_length 4096 \
            --conv_template_name chatml \
            --deepspeed "configs/zero3.json"

Reward Modeling

You can first generate demo dataset by python algorithms/rm/prepare_demo_rm.py

And then run the following command:

autoalign-cli rm --model_name_or_path "meta-llama/Meta-Llama-3-8B-Instruct" \
    --data_path data/ultra_binary.jsonl \
    --bf16 True \
    --eval_path data/eval \
    --conv_template_name llama-3-instruct \
    --output_dir models/llama3_rm \
    --deepspeed configs/zero3.json

DPO

autoalign-cli dpo --model_name_or_path "Qwen/Qwen2.5-7B-Instruct"  \
            --data_path "data/dummy_dpo.json" \
            --bf16 True \
            --output_dir "saved_models/qwen2-7b-dpo" \
            --conv_template_name chatml \
            --deepspeed "configs/zero3.json"

Megatron-based Training

For detailed scripts configuration, please refer to the megatron.md.

Setup the environment:

conda activate ata
pip install -e .
bash scripts/train/megatron/env_install.sh

Setting up the environment may take half an hour as it involves compiling Apex, Flash-Attn, and Transformer Engine. Please be patient.

Convert weights from Hugging Face to Megatron:

bash scripts/train/megatron/convert/qwen2_5/convert_hf_to_mcore.sh

Preprocess the data:

# For SFT
bash scripts/train/megatron/preprocess/sft_conv.sh
# For DPO
bash scripts/train/megatron/preprocess/dpo_conv.sh

Run the training:

# For SFT
bash scripts/train/megatron/train/qwen2_5/sft_conv.sh
# For DPO
bash scripts/train/megatron/train/qwen2_5/dpo_conv.sh

Convert weights from Megatron back to Hugging Face::

# For SFT
bash scripts/train/megatron/convert/qwen2_5/convert_mcore_to_hf.sh
# For DPO
bash scripts/train/megatron/convert/qwen2_5/convert_mcore_to_hf_dpo.sh

Inference

autoalign-cli infer --backend "vllm" \
            --model-name "Qwen2-0.5B-Instruct" \
            --model-path "Qwen/Qwen2-0.5B-Instruct" \
            --test-file "data/dummy_sft.json" \
            --template "chatml" \
            --source "qwen2_0_5b_instruct_dummy"

Serve

autoalign-cli serve --checkpoint-path "Qwen/Qwen2.5-7B-Instruct" \
                    --mode "browser" \
                    --template "chatml"

Merge

autoalign-cli merge --model_paths "psmathur/orca_mini_v3_13b" "WizardLM/WizardLM-13B-V1.2" "garage-bAInd/Platypus2-13B" \
                    --merged_model_path "merged_model" \
                    --merging_method "average"

🛠 Automated Alignment Algorithms

The introduction and scripts for each automated alignment algorithm are stored in the algorithms folder.

Currently, we implemented the following automated alignment algorithms:

Algorithm Name	Discription
rlcd_sys	Context distilling the principles into the models using system prompts.
pcon	Treat the response from larger models as postitive signals, and the response from small models from negative signals.
self-rewarding	Use language model itself to provide its own rewards during training via LLM-as-a-Judge prompting.
cai	A method for training LLMs to be harmless by using a set of guiding principles and a process of self-improvement through supervised and reinforcement learning.

✏️ Model Evaluation

autoalign-cli eval --config-path configs/eval.yaml

You can configure evaluation options in the file eval.yaml. For objective evaluation, the results will be displayed in outputs/{model_id}/ordered_res.txt at the root directory of the repository. For more information, please read docs/eval.md.

Documents

Documents of this toolkit is stored at ./docs/.

📪 Features

Supported Models

Model	Template Name
Llama-2	llama-2-chat
Llama-3/3.1/3.2	llama-3-instruct
Gemma 2	gemma
Qwen1.5	chatml
Qwen2	chatml
Qwen2.5	chatml
Qwen3	chatml-with-empty-think
Mistral v0.1/0.2/0.3	mistral-instruct / zepyhr
Yi1.5	chatml

📈 Reference Results

Model	Dataset / Algorithm	MT-Bench	MATH	GSM-8K	HumanEval	MBPP	HumanEval-CN	MBPP-CN	MMLU	GPQA	CMMLU	C-Eval
Llama-3-8b	Base	--	13.28	55.5	26.83	44.2	20.12	41.2	x	10.61	40.49	x
Llama-3-8b	Instruct	7.95	25	78.62	52.44	51.2	24.39	47.8	59.27	25.76	52.06	50.93
Llama-3.1-8b	Ultrachat	6.47	10.88	52.08	29.27	33.6	16.46	30.2	55.37	26.77	47.19	43.81
Llama-3.1-8b	Base	--	15.24	56.41	27.44	42.8	23.78	41.8	x	12.63	45.43	x
Llama-3.1-8b	Instruct	7.73	31.56	81.12	54.27	53	39.02	50.2	59.88	28.79	49.64	48.3
Llama-3.1-8b	Ultrachat	6.51	13.6	57.39	25.61	33.4	20.73	31.4	55.66	25.76	48.44	32.99
Llama-3.2-1b	Infinity-Instruct	--	12.66	48.07	28.66	31.2	27.44	26.6	44.8	x	33.86	34.04
Qwen-2-7b	Base	5.03	41.3	79.76	61.59	51	60.37	48.4	62.4	31.31	67.72	42.66
Qwen-2-7b	Instruct	8.15	25.38	81.35	51.22	48.6	61.59	24.2	64.1	31.82	62.24	46.04
Qwen-2-7b	Ultrachat	7.34	37.98	77.41	20.73	34.6	11.59	32.8	61.35	31.31	72.23	63.18
Qwen-2-7b	rlcd_sys	7.29	20.76	52.31	35.98	36	29.88	35.4	52.89	21.21	68.98	71.35
Qwen-2-7b	pcon	6.6	35.37	47.43	42.54	79.83	41.46	50.4	57.32	46.8	63.31	28.28
Qwen-2.5-7b	Infinity-Instruct	6.85	39.44	84.08	71.95	58.40	64.02	55.20	74.51	37.88	78.79	80.23
Qwen-3-1.7b	Infinity-Instruct	--	33.76	71.87	57.93	40.2	47.56	38.4	63.37	25.25	61.03	64.1

"--" indicates data unavailable;

"x" indicates inability to obtain valid performance using OpenCompass default prompt.

📅 Future Development Schedule

Benchmarks to be implemented at v0.1.x:

Chat-Arena-Hard
AlignBench

Features to be supported at v0.2.x:

Representation Engineering
Tool use integration

🤝 Contributing

If you would like to contribute to this project, please follow these guidelines:

Fork the repository.

Install for Develop

pip install -e .[dev]
pre-commit install

Create a new branch.
Make your changes.
Submit a pull request.

📜 Citation

@inproceedings{lu-etal-2025-autoalign,
    title = "{A}uto{A}lign: Get Your {LLM} Aligned with Minimal Annotations",
    author = "Lu, Xinyu  and
      Xu, Dong  and
      Zhang, Chunkang  and
      Guan, Xinyan  and
      Wang, Junxiang  and
      Zhang, Qingyu  and
      Wang, Pengbo  and
      Mao, Yingzhi  and
      Xiang, Hao  and
      Wen, Xueru  and
      Li, Zichao  and
      Lu, Yaojie  and
      Lin, Hongyu  and
      Sun, Le  and
      Han, Xianpei",
    editor = "Mishra, Pushkar  and
      Muresan, Smaranda  and
      Yu, Tao",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-demo.19/",
    pages = "189--198",
    ISBN = "979-8-89176-253-4",
}

Acknowledgements

This project takes inspiration from several other projects, including: Llama-Factory, verl, 360-Llama-Factory. Great thanks to the developers of these projects for their contributions to the community.

💳 License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.github		.github
algorithms		algorithms
assets		assets
configs		configs
data		data
docs		docs
scripts		scripts
src		src
tests		tests
ui		ui
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📣 About

🚀 News

🔥 Quick Start

🔨 Environment Setup

📂 Data

📚 Basic Alignment Operators

SFT

Reward Modeling

DPO

Megatron-based Training

Inference

Serve

Merge

🛠 Automated Alignment Algorithms

✏️ Model Evaluation

Documents

📪 Features

Supported Models

📈 Reference Results

📅 Future Development Schedule

Benchmarks to be implemented at v0.1.x:

Features to be supported at v0.2.x:

🤝 Contributing

📜 Citation

Acknowledgements

💳 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

icip-cas/AutoAlign

Folders and files

Latest commit

History

Repository files navigation

📣 About

🚀 News

🔥 Quick Start

🔨 Environment Setup

📂 Data

📚 Basic Alignment Operators

SFT

Reward Modeling

DPO

Megatron-based Training

Inference

Serve

Merge

🛠 Automated Alignment Algorithms

✏️ Model Evaluation

Documents

📪 Features

Supported Models

📈 Reference Results

📅 Future Development Schedule

Benchmarks to be implemented at v0.1.x:

Features to be supported at v0.2.x:

🤝 Contributing

📜 Citation

Acknowledgements

💳 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages