Skip to content

Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing [EMNLP 2025 Main Conference]

License

Notifications You must be signed in to change notification settings

ApplyU-ai/ResumeBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

ResumeBench: Beyond Human Labels

A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing

Conference
πŸ“„ Accepted at EMNLP 2025 Main Conference


πŸ” Overview

Efficient resume parsing is essential for global hiring in today’s AI era, yet progress has been limited by the absence of a dedicated benchmark for evaluating large language models (LLMs) on multilingual, structure-rich resumes.

We introduce ResumeBench, the first privacy-compliant benchmark featuring:

  • 2,500 synthetic resumes across 50 templates,
  • Covering 30 career fields and 5 languages (English, Chinese, Spanish, French, German),
  • Generated via a human-in-the-loop pipeline emphasizing realism, diversity, and privacy compliance.

Our study evaluates 24 state-of-the-art LLMs, uncovering challenges in structural alignment, multilingual robustness, and semantic reasoning.


πŸ“Š Key Features

  • Human-in-the-loop generation: Enhances authenticity while ensuring privacy.
  • Multilingual coverage: Five languages, addressing cross-lingual complexities in resumes.
  • Template diversity: 50 resume layouts including single-column, double-column, and designed formats.
  • Mixed benchmark: Combines synthetic and real-world samples for robust evaluation.
  • LLM evaluation: Assessed 24 models including GPT-4o, code-specialized LLMs, and VLMs.

πŸ“„ Paper

Paper can be found here.


πŸ“‚ Dataset & Access

The ResumeBench dataset and associated code are made available only for non-commercial research and educational purposes under the CC BY-NC 4.0 License.

πŸ‘‰ To request access to the dataset, please send your full name, email address (Please do not use personal email addresses), affiliation, and intended use of the dataset to zijian.ling@applyu.ai.

By submitting an access request, you agree to comply with the CC BY-NC 4.0 license and to cite this associated paper in any publications resulting from the use of this dataset.


βš–οΈ License

  • Dataset & Code: Released under CC BY-NC 4.0 (Attribution-NonCommercial 4.0 International).

πŸ”— Citation

If you find this dataset useful, please cite:

@inproceedings{ling-etal-2025-beyond,
    title = "Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing",
    author = "Ling, Zijian  and
      Zhang, Han  and
      Cui, Jiahao  and
      Wu, Zhequn  and
      Sun, Xu  and
      Li, Guohao  and
      He, Xiangjian",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1626/",
    pages = "31907--31933",
    ISBN = "979-8-89176-332-6"
}

πŸ“¬ Contact

For questions or dataset requests, please contact zijian.ling@applyu.ai

About

Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing [EMNLP 2025 Main Conference]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published