Skip to content

icip-cas/AutoAlign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

390 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo

πŸ”₯Quick Start β€’ πŸ“ͺFeatures β€’ πŸ“ˆResults β€’ πŸ›Issues β€’ πŸ“œCitation β€’ πŸ“„Docs

πŸ“£ About

Auto-Alignment is a package focusing on scalable and automated alignment methods. We aim to provide the academic community with a series of classic alignment baselines and ready-to-use automated alignment algorithms. This toolkit is designed to facilitate research in the field of LLM alignment.

The core functionalities of the toolkit include:

  • Implementation of common alignment operators (e.g., SFT, RM, DPO, GRPO, etc.)
  • Implementation of various automatic model alignment algorithms (e.g., CAI, SPIN, RLCD, etc.)
  • Efficient model sampling
  • Automated model evaluation
  • After training intervention methods (e.g., Model Averaging, etc.)

This toolkit can seamlessly integrate with popular resource scheduling systems like Slurm.

framework

πŸš€ News

[2025.7.25] Liger Kernel for Supervised Fine-Tuning (SFT) has been integrated! 20%-25% faster than the previous version.

[2025.6.19] We've integrated comprehensive safety evaluation datasets and released the Megatron implementation for large-scale distributed training.

[2025.5.23] AutoAlign has been accepted for presentation at the ACL 2025 Demo Track!

[2025.1.1] AutoAlign now supports Self-Rewarding and Constitutional AI.

[2024.8.23] We are excited to announce the release of AutoAlign v0.0.1! In this first version, we have implemented a variety of alignment operators, including SFT, RM, and DPO, along with user-friendly evaluation systems and several auto-alignment algorithms (CAI, PCON, and a variant of RLCD), with more algorithms on the way. Stay tuned for future updates! πŸ”₯πŸ”₯πŸ”₯

πŸ”₯ Quick Start

πŸ”¨ Environment Setup

Default

conda create -n ata python=3.10
conda activate ata
pip install .[train]
# alternatively, use uv for improved installation speed
# pip install uv
# uv pip install .[train]

Evaluation (Optional)

conda create -n ata_eval --clone ata
conda activate ata_eval
pip install .[eval]
bash ./scripts/post_install.sh

πŸ“‚ Data

To facilitate the community with out-of-box alignment data. We publicly release a collection of SFT data here. This toolkit currently utilizes the format in data/dummy_sft.json for supervised fine-tuning and the format in data/dummy_dpo.json for the reinforcement learning process. Please refer to these files for more details.

πŸ“š Basic Alignment Operators

SFT

autoalign-cli sft \
            --model_name_or_path "Qwen/Qwen2.5-7B" \
            --data_path "data/dummy_sft.json" \
            --bf16 True \
            --output_dir "models/qwen2-7b-sft" \
            --model_max_length 4096 \
            --conv_template_name chatml \
            --deepspeed "configs/zero3.json"

Reward Modeling

You can first generate demo dataset by python algorithms/rm/prepare_demo_rm.py

And then run the following command:

autoalign-cli rm --model_name_or_path "meta-llama/Meta-Llama-3-8B-Instruct" \
    --data_path data/ultra_binary.jsonl \
    --bf16 True \
    --eval_path data/eval \
    --conv_template_name llama-3-instruct \
    --output_dir models/llama3_rm \
    --deepspeed configs/zero3.json

DPO

autoalign-cli dpo --model_name_or_path "Qwen/Qwen2.5-7B-Instruct"  \
            --data_path "data/dummy_dpo.json" \
            --bf16 True \
            --output_dir "saved_models/qwen2-7b-dpo" \
            --conv_template_name chatml \
            --deepspeed "configs/zero3.json"

Megatron-based Training

For detailed scripts configuration, please refer to the megatron.md.

  1. Setup the environment:
conda activate ata
pip install -e .
bash scripts/train/megatron/env_install.sh

Setting up the environment may take half an hour as it involves compiling Apex, Flash-Attn, and Transformer Engine. Please be patient.

  1. Convert weights from Hugging Face to Megatron:
bash scripts/train/megatron/convert/qwen2_5/convert_hf_to_mcore.sh
  1. Preprocess the data:
# For SFT
bash scripts/train/megatron/preprocess/sft_conv.sh
# For DPO
bash scripts/train/megatron/preprocess/dpo_conv.sh
  1. Run the training:
# For SFT
bash scripts/train/megatron/train/qwen2_5/sft_conv.sh
# For DPO
bash scripts/train/megatron/train/qwen2_5/dpo_conv.sh
  1. Convert weights from Megatron back to Hugging Face::
# For SFT
bash scripts/train/megatron/convert/qwen2_5/convert_mcore_to_hf.sh
# For DPO
bash scripts/train/megatron/convert/qwen2_5/convert_mcore_to_hf_dpo.sh

Inference

autoalign-cli infer --backend "vllm" \
            --model-name "Qwen2-0.5B-Instruct" \
            --model-path "Qwen/Qwen2-0.5B-Instruct" \
            --test-file "data/dummy_sft.json" \
            --template "chatml" \
            --source "qwen2_0_5b_instruct_dummy"

Serve

autoalign-cli serve --checkpoint-path "Qwen/Qwen2.5-7B-Instruct" \
                    --mode "browser" \
                    --template "chatml"

Merge

autoalign-cli merge --model_paths "psmathur/orca_mini_v3_13b" "WizardLM/WizardLM-13B-V1.2" "garage-bAInd/Platypus2-13B" \
                    --merged_model_path "merged_model" \
                    --merging_method "average"

πŸ›  Automated Alignment Algorithms

The introduction and scripts for each automated alignment algorithm are stored in the algorithms folder.

Currently, we implemented the following automated alignment algorithms:

Algorithm Name Discription
rlcd_sys Context distilling the principles into the models using system prompts.
pcon Treat the response from larger models as postitive signals, and the response from small models from negative signals.
self-rewarding Use language model itself to provide its own rewards during training via LLM-as-a-Judge prompting.
cai A method for training LLMs to be harmless by using a set of guiding principles and a process of self-improvement through supervised and reinforcement learning.

✏️ Model Evaluation

autoalign-cli eval --config-path configs/eval.yaml

You can configure evaluation options in the file eval.yaml. For objective evaluation, the results will be displayed in outputs/{model_id}/ordered_res.txt at the root directory of the repository. For more information, please read docs/eval.md.

Documents

Documents of this toolkit is stored at ./docs/.

πŸ“ͺ Features

Supported Models

Model Template Name
Llama-2 llama-2-chat
Llama-3/3.1/3.2 llama-3-instruct
Gemma 2 gemma
Qwen1.5 chatml
Qwen2 chatml
Qwen2.5 chatml
Qwen3 chatml-with-empty-think
Mistral v0.1/0.2/0.3 mistral-instruct / zepyhr
Yi1.5 chatml

πŸ“ˆ Reference Results

Model Dataset / Algorithm MT-Bench MATH GSM-8K HumanEval MBPP HumanEval-CN MBPP-CN MMLU GPQA CMMLU C-Eval
Llama-3-8b Base -- 13.28 55.5 26.83 44.2 20.12 41.2 x 10.61 40.49 x
Llama-3-8b Instruct 7.95 25 78.62 52.44 51.2 24.39 47.8 59.27 25.76 52.06 50.93
Llama-3.1-8b Ultrachat 6.47 10.88 52.08 29.27 33.6 16.46 30.2 55.37 26.77 47.19 43.81
Llama-3.1-8b Base -- 15.24 56.41 27.44 42.8 23.78 41.8 x 12.63 45.43 x
Llama-3.1-8b Instruct 7.73 31.56 81.12 54.27 53 39.02 50.2 59.88 28.79 49.64 48.3
Llama-3.1-8b Ultrachat 6.51 13.6 57.39 25.61 33.4 20.73 31.4 55.66 25.76 48.44 32.99
Llama-3.2-1b Infinity-Instruct -- 12.66 48.07 28.66 31.2 27.44 26.6 44.8 x 33.86 34.04
Qwen-2-7b Base 5.03 41.3 79.76 61.59 51 60.37 48.4 62.4 31.31 67.72 42.66
Qwen-2-7b Instruct 8.15 25.38 81.35 51.22 48.6 61.59 24.2 64.1 31.82 62.24 46.04
Qwen-2-7b Ultrachat 7.34 37.98 77.41 20.73 34.6 11.59 32.8 61.35 31.31 72.23 63.18
Qwen-2-7b rlcd_sys 7.29 20.76 52.31 35.98 36 29.88 35.4 52.89 21.21 68.98 71.35
Qwen-2-7b pcon 6.6 35.37 47.43 42.54 79.83 41.46 50.4 57.32 46.8 63.31 28.28
Qwen-2.5-7b Infinity-Instruct 6.85 39.44 84.08 71.95 58.40 64.02 55.20 74.51 37.88 78.79 80.23
Qwen-3-1.7b Infinity-Instruct -- 33.76 71.87 57.93 40.2 47.56 38.4 63.37 25.25 61.03 64.1

"--" indicates data unavailable;

"x" indicates inability to obtain valid performance using OpenCompass default prompt.

πŸ“… Future Development Schedule

Benchmarks to be implemented at v0.1.x:

  • Chat-Arena-Hard
  • AlignBench

Features to be supported at v0.2.x:

  • Representation Engineering
  • Tool use integration

🀝 Contributing

If you would like to contribute to this project, please follow these guidelines:

  1. Fork the repository.

  2. Install for Develop

    pip install -e .[dev]
    pre-commit install
  3. Create a new branch.

  4. Make your changes.

  5. Submit a pull request.

πŸ“œ Citation

@inproceedings{lu-etal-2025-autoalign,
    title = "{A}uto{A}lign: Get Your {LLM} Aligned with Minimal Annotations",
    author = "Lu, Xinyu  and
      Xu, Dong  and
      Zhang, Chunkang  and
      Guan, Xinyan  and
      Wang, Junxiang  and
      Zhang, Qingyu  and
      Wang, Pengbo  and
      Mao, Yingzhi  and
      Xiang, Hao  and
      Wen, Xueru  and
      Li, Zichao  and
      Lu, Yaojie  and
      Lin, Hongyu  and
      Sun, Le  and
      Han, Xianpei",
    editor = "Mishra, Pushkar  and
      Muresan, Smaranda  and
      Yu, Tao",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-demo.19/",
    pages = "189--198",
    ISBN = "979-8-89176-253-4",
}

Acknowledgements

This project takes inspiration from several other projects, including: Llama-Factory, verl, 360-Llama-Factory. Great thanks to the developers of these projects for their contributions to the community.

πŸ’³ License

This project is licensed under the Apache-2.0 License.

About

A toolkit for automated alignment research.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8