A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing
π Accepted at EMNLP 2025 Main Conference
Efficient resume parsing is essential for global hiring in todayβs AI era, yet progress has been limited by the absence of a dedicated benchmark for evaluating large language models (LLMs) on multilingual, structure-rich resumes.
We introduce ResumeBench, the first privacy-compliant benchmark featuring:
- 2,500 synthetic resumes across 50 templates,
- Covering 30 career fields and 5 languages (English, Chinese, Spanish, French, German),
- Generated via a human-in-the-loop pipeline emphasizing realism, diversity, and privacy compliance.
Our study evaluates 24 state-of-the-art LLMs, uncovering challenges in structural alignment, multilingual robustness, and semantic reasoning.
- Human-in-the-loop generation: Enhances authenticity while ensuring privacy.
- Multilingual coverage: Five languages, addressing cross-lingual complexities in resumes.
- Template diversity: 50 resume layouts including single-column, double-column, and designed formats.
- Mixed benchmark: Combines synthetic and real-world samples for robust evaluation.
- LLM evaluation: Assessed 24 models including GPT-4o, code-specialized LLMs, and VLMs.
Paper can be found here.
The ResumeBench dataset and associated code are made available only for non-commercial research and educational purposes under the CC BY-NC 4.0 License.
π To request access to the dataset, please send your full name, email address (Please do not use personal email addresses), affiliation, and intended use of the dataset to zijian.ling@applyu.ai.
By submitting an access request, you agree to comply with the CC BY-NC 4.0 license and to cite this associated paper in any publications resulting from the use of this dataset.
- Dataset & Code: Released under CC BY-NC 4.0 (Attribution-NonCommercial 4.0 International).
If you find this dataset useful, please cite:
@inproceedings{ling-etal-2025-beyond,
title = "Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing",
author = "Ling, Zijian and
Zhang, Han and
Cui, Jiahao and
Wu, Zhequn and
Sun, Xu and
Li, Guohao and
He, Xiangjian",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.1626/",
pages = "31907--31933",
ISBN = "979-8-89176-332-6"
}
For questions or dataset requests, please contact zijian.ling@applyu.ai