This is the paper list for the literature summarized in our survey published in EMNLP 2024: The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning
If you find our survey useful for your research on commonsense causality, please support us by citing our work as follows:
@inproceedings{cui-etal-2024-odyssey,
title = "The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning",
author = {Cui, Shaobo and
Jin, Zhijing and
Sch{\"o}lkopf, Bernhard and
Faltings, Boi},
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.932",
pages = "16722--16763",
}
Table of Contents
- Main Part of This Paper List
- If You Want to Know More Details
According to the commonsense types (see Appendix for more background on commonsense types), causality can be roughly classified into four categories:
-
Physical Causality
Physical causality refers to the cause-effect relationships grounded in the physical world. It typically covers domains such as physics, chemistry, and environmental science. Example datasets include: CRAFT (Ates et al., 2022), e-CARE (Du et al., 2022). -
Social Causality
Social causality involves understanding social norms, cultures, human behavior, intents, and reactions. For instance, criticism (cause) can lead to depression (effect) in a social context. It covers domains like law, culture, education, and psychology. Example datasets include: ATOMIC (Sap et al., 2019), GLUCOSE (Mostafazadeh et al., 2020), IfQA (Yu et al., 2023), etc. -
Biological Causality
Biological causality relates to cause-effect pairs that govern biological processes and phenomena, such as how a healthy diet contributes to longevity. Example datasets include: BioCause (Mihuailua et al., 2013), CBND (Boue et al., 2015), etc. -
Temporal Causality
Temporal causality involves the sequential understanding that a cause must precede an effect in time. Example datasets include: Temporal-Causal (Bethard et al., 2008), CausalTimeBank (Mirza et al., 2014), CaTeRs (Mostafazadeh et al., 2016), etc.
| Dataset | Annotation Unit | #Overall | #Causal | C.F. | Commonsense Types | Brief Introduction | License |
|---|---|---|---|---|---|---|---|
| First-Principle Causality | |||||||
| CauseEffectPairs Mooij et al., 2016 | Variable | 108 | 108 | - | General | 108 different cause-effect pairs selected from 37 datasets covering domains like meteorology, economy, medicine, engineering, biology. Focuses on the causal discovery problem (deciding whether X causes Y or Y causes X). | FreeBSD |
| IHDP Shalit et al., 2017 | Variable | 2,000 | 2,000 | Β½ | Biological | IHDP is the Infant Health and Development Program dataset, focusing on the effect of home visits on cognitive test scores for infants. | Custom Dataset Terms |
| CRAFT Ates et al., 2022 | Video | 58,000 | - | Full | Physical | A video question-answering dataset requiring comprehension of physical forces and object interactions. Contains descriptive and counterfactual questions. | MIT |
| Commonsense Causality in Text Format | |||||||
| Temporal-Causal Bethard et al., 2008 | Clause | 1,000 | 271 | - | Temporal | A corpus of 1,000 event pairs covering both temporal and causal relations. | Missing |
| CW Ferguson & Sanford, 2008 | Clause | 128 | 128 | Full | General | CW is collected from psycholinguistic experiments and includes counterfactual examples. | Missing |
| SemEval07-T4 Girju et al., 2007 | Phrase | 220 | 114 | - | General | Focuses on semantic analysis and automatic recognition of relations between word pairs, including causal relations. | Missing |
| SemEval10-T8 Hendrickx et al., 2010 | Phrase | 10,717 | 1,331 | - | General | Similar to SemEval07-T4, focuses on classification of semantic relations between pairs of nominals, including cause-effect relations. | CC BY 3.0 Unported |
| COPA Roemmele et al., 2011 | Sentence | 2,000 | 1,000 | - | General | Each question has a premise and two plausible causes/effects, with the correct one being more plausible. | BSD 2-Clause |
| EventCausality Do et al., 2011 | Clause | 583 | 583 | - | General | A causality corpus built by detecting causality between events using discourse connectives. | Missing |
| BioCause Mihuailua et al., 2013 | Clause | 851 | 851 | - | Biological | Contains 851 causal relations from 19 biomedical journal articles in infectious diseases. | Creative Commons |
| CausalTimeBank Mirza et al., 2014 | Sentence | 318 | 318 | - | Temporal | Timebank corpus with causal samples taken from TempEval-3 corpus. | CC BY-NC-SA 3.0 |
| CaTeRs Mostafazadeh et al., 2016 | Sentence | 2,502 | 308 | - | Temporal | Causal and temporal relations annotated from ROCStories corpus. | Missing |
| AltLex Hidey & McKeown, 2016 | Clause | 44,240 | 4,595 | - | General | An open class of markers that contains causality. | Missing |
| BECauSE 2.0 Dunietz et al., 2017 | Sentence | 729 | 554 | - | General | Focuses on causal relations and other co-existing relations. | MIT |
| ESL Caselli & Vossen, 2017 | Sentence | 2,608 | 2,608 | - | Temporal | A corpus for detecting causal and temporal relations. | CC BY 3.0 Unported |
| PDTB Webber et al., 2019 | Clause | 7,991 | 7,991 | - | General | Marks discourse relations, including causation, grounded in explicit words or phrases. | LDC User Agreement |
| TimeTravel Qin et al., 2019 | Sentence | 109,964 | 29,849 | Β½ | General | Contains original stories, counterfactual facts, and new storylines compatible with the counterfactual facts. | MIT |
| GLUCOSE | Clause | 670K | 670K | - | Social | Annotates 10 dimensions of causal explanation from short stories, focusing on implicit causes and effects. | Creative Commons Attribution-NonCommercial 4.0 |
| XCOPA Ponti et al., 2020 | Sentence | 11,000 | 11,000 | - | General | Multilingual version of the COPA dataset, spanning 11 languages. | CC BY 4.0 |
| SemEval20-T5 Yang et al., 2020 | Clause | 25,501 | 25,501 | Full | General | Dataset for determining counterfactual statements and extracting antecedents and consequents. | Missing |
| CausalBank Li et al., 2021 | Clause | 314M | 314M | - | General | Cause-effect statements collected from the Common Crawl corpus using causal lexical patterns. | - |
| e-CARE Du et al., 2022 | Sentence | 21,324 | 21,324 | - | Physical | Cause-effect pairs and conceptual explanations for causation. | MIT |
| CoSIm Kim et al., 2022 | Image + Text | 3,500 | 3,500 | Full | General | A multimodal counterfactual reasoning dataset for commonsense scene imagination, with both text and image components. | MIT |
| CRASS Frohberg & Binder, 2022 | Sentence | 274 | 274 | Full | General | Focuses on counterfactual reasoning in a question-answering format. | Apache 2.0 |
| IfQA Yu et al., 2023 | Sentence | 3,800 | 3,800 | Full | Social | Open-domain counterfactual question-answering dataset. | Missing |
| CW-extended Li et al., 2023 | Sentence | 10,848 | 10,848 | Full | General | Augmentation of CW dataset through word replacements, focusing on counterfactual statements. | Missing |
| CausalQuest Ceraolo et al., 2024 | Sentence | 13,500 | 13,500 | Β½ | General | A dataset of natural causal questions collected from social networks, search engines, and AI assistants. | Apache 2.0 |
| Ξ΄-CAUSAL Cui et al., 2024 | Sentence | 11,245 | 11,245 | Β½ | General | Causal dataset exploring defeasibility and uncertainty in commonsense causality. | MIT |
| Commonsense Causality in Knowledge Graph Format | |||||||
| CausalNet Luo et al., 2016 | Word | 11M | 11M | - | General | Vast collection of causal relationships from Bing web pages. | Missing |
| ConceptNet Speer et al., 2017 | Phrase | 473,000 | - | - | General | Knowledge graph version of the Open Mind Common Sense project, including causal relations. | CC BY-SA 4.0 |
| Event2Mind Rashkin et al., 2018 | Phrase | 25,000 | - | - | Social | Annotates intent and reactions to given events, including causal relations. | MIT |
| ATOMIC Sap et al., 2019 | Sentence | 877K | - | Β½ | Social | Collects commonsense knowledge in the form of "if-then" relations. | CC BY 4.0 |
| ASER Zhang et al., 2020 | Sentence | 64M | 494K | - | General | Eventuality knowledge graph extracted from large textual data. | MIT |
| CauseNet Heindorf et al., 2020 | Word | 11M | 11M | - | General | Causal relations extracted from web sources with 83% precision. | CC BY 4.0 |
| CEGraph Li et al., 2021 | Phrase | 89.1M | 89.1M | - | General | Large lexical causal knowledge graph associated with CausalBank. | Missing |
- Sakaji et al. (2008): "Extracting Causal Knowledge Using Clue Phrases and Syntactic Patterns", paper.
- Cole et al. (2006): "A lightweight tool for automatically extracting causal relationships from text, paper.
- Girju (2003): "Automatic Detection of Causal Relations for Question Answering", paper.
- Blanco et al. (2008): "Causal Relation Extraction", paper.
- Zhao et al. (2016): "Event Extraction Using Causal Connectives", paper.
- Rashkin et al. (2018): "Event2Mind: Commonsense Inference on Events, Intents, and Reactions", paper.
- Choudhry (2020): "Narrative Generation to Support Causal Exploration of Directed Graphs", paper.
- Li et al. (2021): "Guided Generation of Cause and Effect", paper.
- Kim et al. (2023): "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization", paper.
- Palmer et al. (2005): "The Proposition Bank: An Annotated Corpus of Semantic Roles", paper.
- Mihaila et al. (2013): "What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse", paper.
- Dunietz (2018): "Annotating and automatically tagging constructions of causal language", thesis.
- Mostafazadeh et al. (2016): "CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures", paper.
- Mirza et al. (2014): "Annotating Causality in the TempEval-3 Corpus", paper.
| Method | Accuracy | Cost | Coverage | Explainability |
|---|---|---|---|---|
| Extractive | β β β β β | β β β β β | β β β β β | β β β β β |
| Generative | β β β ββ | β β β β β | β β β ββ | β ββββ |
| Manual Annotation | β β β β β | β β βββ | β β β β β | β β β β β |
Qualitative causal reasoning focuses on classifying cause-effect relationships in a binary fashion, often bypassing uncertainty through simplification.
- Jin et al. (2023): "CLadder: Assessing Causal Reasoning in Language Models", paper.
- Zhang et al. (2022): "ROCK : Causal Inference Principles for Reasoning about Commonsense Causality", paper.
- Ning et al. (2018): "Joint Reasoning for Temporal and Causal Relations", [paper] (https://arxiv.org/abs/1808.09506).
- Zhang and Foo (2001): "Embedding Logic Rules into Causal Reasoning Mechanisms."
Quantitative causal reasoning provides numerical estimates for causal effects, accounting for uncertainty and variability in causal relationships.
- Good (1961): "Log-Likelihood Metric for Causality Strength."
- Suppes (1973): "A Probabilistic Theory of Causality."
- Eells (1991): "Probabilistic Causality and Its Applications."
- Pearl (2009): "Causality: Models, Reasoning, and Inference."
- Luo et al. (2016): "CEQ: A Word-Level Causal Estimation Metric."
- Cui et al. (2024): "CESAR: A Weighted Approach to Measuring Causal Strength in Text."

