We introduce NewsCorpus covering contemporary events curated from AllSides from July 21, 2012 – July 23, 2024.The dataset contains 17,166 articles grouped into 5,722 news round-ups across 64 topical categories.
Data were collected using a public scraping tool. The scraping script was adapted due to updates in the AllSides HTML structure.
Each row/ roundup contains three parallel news stories representing different political perspectives:
- Left or Left-leaning media
- Center media
- Right-leaning or Right media
If you use this dataset or build upon it in your research, please cite the following paper:
Wang, Q., Khatiwada, P., Chouhan, A., Mahesh, A., Mwaria, J., Tran, D. D., ... & Mauriello, M. L. (2026). " The explanation makes sense": An Empirical Study on LLM Performance in News Classification and its Influence on Judgment in Human-AI Collaborative Annotation. arXiv preprint arXiv:2602.19690.
@article{wang2026explanation,
title={"The explanation makes sense": An Empirical Study on LLM Performance in News Classification and its Influence on Judgment in Human-AI Collaborative Annotation},
author={Wang, Qile and Khatiwada, Prerana and Chouhan, Avinash and Mahesh, Ashrey and Mwaria, Joy and Tran, Duy Duc and Barner, Kenneth E and Mauriello, Matthew Louis},
journal={arXiv preprint arXiv:2602.19690},
year={2026}
}👉 Checkout preview: NewsCorpus17K_sample100.csv
Date— publication date of the roundupTopic— categorical topic labelTitle of Headline Roundup— AllSides roundup headlineurl_story— AllSides roundup link
left_story_title— article headlineleft_story_url— article URLleft_story_source— publisher nameleft_story_leaning— political bias labelleft_story_text— preview or snippet text
center_story_title— article headlinecenter_story_url— article URLcenter_story_source— publisher namecenter_story_leaning— political bias labelcenter_story_text— preview or snippet text
right_story_title— article headlineright_story_url— article URLright_story_source— publisher nameright_story_leaning— political bias labelright_story_text— preview or snippet text
- Elections and Democracy: 2024 Presidential Election, Elections, Voting Rights and Voter Fraud, Campaign Finance, Polarization
- US Politics and Law: Politics, Supreme Court, Civil Rights, Federal State and Tribal Powers, Free Speech, Privacy
- Justice and Safety: Criminal Justice, Violence in America, Gun Control and Gun Rights, Terrorism, Sexual Misconduct
- Health and COVID: Healthcare, Public Health, Coronavirus, Life During COVID-19, COVID-19 Misinformation, Safety and Sanity During COVID-19
- Economy and Business: Economy and Jobs, Business, Banking and Finance, Taxes, Trade, Housing and Homelessness
- International Affairs: World, Foreign Policy, The Americas, Middle East, China, Russia, Ukraine War, Defense and Security
- Society and Rights: Race and Racism, LGBTQ Issues, Abortion, Education, Family and Marriage, Religion and Faith, Inequality
- Science and Technology: Science, Technology
- Environment and Energy: Environment, Climate Change, Sustainability, Energy, Water and Oceans
- Media and Information: Media Industry, Media Bias, Fake News, Facts and Fact Checking, Common Ground
- Culture and Sports: Culture, Arts and Entertainment, Sports
- People and Figures: Joe Biden, Donald Trump
Political bias is assigned at the publisher level using AllSides ratings. Each article inherits the political leaning of its publisher. Prior work shows approximately 97% agreement between publisher-level and article-level bias.
We selected 60 recent news articles across 20 headline round-ups from NewsCorpus17k.csv. Each record includes aligned left, center, and right news perspectives enriched with GPT-generated political ideology predictions and two types of explanations (brief and detailed). The explanations are intended to provide transparent reasoning behind the model’s predicted labels. This subset is designed for use in human annotation tasks and evaluation studies.
left_story_GPT_pred— GPT-predicted political leaningleft_story_GPT_explanation— GPT-generated explanation of predictioncenter_story_GPT_pred-— GPT-predicted political leaningcenter_story_GPT_explanation— GPT-generated explanation of predictionright_story_GPT_pred— GPT-predicted political leaningright_story_GPT_explanation— GPT-generated explanation of prediction
This dataset contains publicly available news metadata collected for research purposes.
Users should respect original publisher copyright and terms of service.
For questions, contact: kylewang@udel.edu
This project is licensed under the MIT License. See LICENSE.md for details.