🏆 Leaderboard | 🖥️ GitHub | 🤗 Hugging Face | 📑 Paper
Information Capacity evaluates an LLM's efficiency based on text compression performance relative to computational complexity, harnessing the inherent correlation between compression and intelligence. Larger models can predict the next token more accurately, leading to higher compression gains but at increased computational costs. Consequently, a series of models with varying sizes exhibits consistent information capacity, which can be used to compare model capability across model series and predict model performance within a series. It also facilitates dynamic routing of different-sized models for efficient handling of tasks with varying difficulties, which is especially relevant to the device-edge-cloud infrastructure detailed in the AI Flow framework. With the rapid evolution of edge intelligence, we believe that this hierarchical network will replace the mainstream cloud-centric computing scheme in the near future.
Compared to existing metrics on LLM efficiency, a key difference of information capacity is that it considers the influence of tokenizer efficiency. An effective tokenizer can represent a given text with fewer tokens, thus reducing both the input and output token counts. This reduction not only lowers computational costs and inference delay but also facilitates long-context memory and in-depth reasoning. Tokenizer efficiency exhibits growing significance in light of the exploding input length and the widespread usage of test-time scaling, but is often neglected in LLM evaluations. We assess the information capacity of 49 models across 5 heterogeneous datasets and find consistent evidence regarding the influences of tokenizer efficiency, pretraining data, and the mixture-of-experts (MoE) architecture.
The model intelligence is measured by the data size savings achieved from the LLM's probability prediction.
The original size of a text sample in the given dataset is denoted as
In summary, the information capacity is defined as: $$ \text{Information Capacity} = \frac{C - \sum_{i} -\log p(x_i | x_{<i} ; M)}{ \log N_M} . $$
Step 1. Setup an environment viable for model inference.
pip install numpy torch transformers tqdm flash_attn huggingface_hubStep 2. Clone this repo.
git clone https://github.com/TeleAI-AI-Flow/InformationCapacity.git
cd InformationCapacityStep 3. Download test datasets.
hf download TeleAI-AI-Flow/InformationCapacity --repo-type=dataset --include "datasets/**" --local-dir .Step 4. Run evaluation code.
python calc_ic.py -m path/to/model -d datasets/mixed_text.jsonl -l 1024 -b 1@misc{yuan2025informationcapacity,
title={Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression},
author={Cheng Yuan and Jiawei Shao and Chi Zhang and Xuelong Li},
year={2025},
eprint={2511.08066},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2511.08066},
}
@misc{an2025aiflowperspectivesscenarios,
title={AI Flow: Perspectives, Scenarios, and Approaches},
author={Hongjun An and Wenhan Hu and Sida Huang and Siqi Huang and Ruanjun Li and Yuanzhi Liang and Jiawei Shao and Yiliang Song and Zihan Wang and Cheng Yuan and Chi Zhang and Hongyuan Zhang and Wenhao Zhuang and Xuelong Li},
year={2025},
eprint={2506.12479},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2506.12479},
}
@misc{shao2025aiflownetworkedge,
title={AI Flow at the Network Edge},
author={Jiawei Shao and Xuelong Li},
year={2025},
eprint={2411.12469},
archivePrefix={arXiv},
primaryClass={eess.SP},
url={https://arxiv.org/abs/2411.12469},
}