Paper | Installation | To Run | Results
FOLK is a claim verification model that can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs).
If our code or data helps you in your research, please kindly cite us:
@article{wang2023explainable,
title={Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models},
author={Wang, Haoran and Shu, Kai},
journal={arXiv preprint arXiv:2310.05253},
year={2023}
}Install conda environment from environment.yml file.
conda env create -n folk --file environment.yml
conda activate folkPlease add your OpenAI and SerpApi key in keys.py file.
To decompose claims:
python decompose.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \
--max_token 1024 \
--temperature 0.7To ground answers:
python groudning.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model text-davinci \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \To make predictions:
python aggregate.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \
--max_token 1024 \
--temperature 0.7To evaluate:
python evaluation.py \
--dataset hover \
--hover_num_hop three \
--prompt_strategy logic \
--model text-davinci \
--version "V1.0"The experiment results reported in Table 2 from the paper are listed in Final_Results folder. To evaluate the results, please execute the following script:
./results.shThe ProgramFC baseline is contained in ProgramFC folder. The code is modified from the original repo to process dataset used in the paper.