Enhancing Multi-Hop Reasoning with Hyperbolic Representations

We used this Paper as our Base Method: Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks

Datasets:

All datasets live under the dataset/ directory.

2WikiMultiHopQA

Download: Download the 2WikiMultiHopQA Dataset from here: 2WikiMultiHopQADataset.
Extract: Make sure you have:

dataset/2wikimultihop/train.json
dataset/2wikimultihop/dev.json
dataset/2wikimultihop/test.json

MetaQA

Download: Download the MetaQA Dataset from here: MetaQA. You only need the following files:

dataset/metaqa/2-hop/vanilla/qa_train.txt, qa_dev.txt & qa_test.txt
dataset/metaqa/2-hop/qa_train_qtype.txt, qa_dev_qtype.txt & qa_test_qtype.txt
dataset/metaqa/kb.txt

Preprocess: Use the metaqa_preprocess.py file to preprocess the MetaQA dataset. You can adjust the input and output paths in the python file at the bottom.
Extract: Make sure you have the following files after preprocessing:

dataset/metaqa/vanilla/metaqa_train_evidences.json
dataset/metaqa/vanilla/metaqa_dev_evidences.json
dataset/metaqa/vanilla/metaqa_test_evidences.json

MLPQ

Download: Download the MLPQ Dataset from here: MLPQ

Preprocess Use the mlpq_preprocess.py file to preprocess the MLPQ dataset. For example like this: \

python mlpq_preprocess.py --en_fr dataset/mlpq/Questions/fr-en/2-hop/r2r_en_fr_question_en --fr_en dataset/mlpq/Questions/fr-en/2-hop/r2r_fr_en_question_en --out_dir ./mlpq_json

Extract Make sure you have the following files after preprocessing:

dataset/mlpq/Questions/fr-en/2-hop/2hop_train_question_evidences.json
dataset/mlpq/Questions/fr-en/2-hop/2hop_dev_question_evidences.json
dataset/mlpq/Questions/fr-en/2-hop/2hop_test_question_evidences.json

PQ

Download: Download the PQ Dataset from here PathQuestion.
Extract: Make sure you have

dataset/pathquestion/PQ-2H.txt

Content

src

Contains the actual code implementations which consist of:

datasets has all necessary datasets that should be created like KnowledgeIntegrationDataset, RandomWalkDataset etc.
train consists of 2 training methods. Model Training (for pretraining) and SoftPrompt Training for training the parsing and hopping soft prompt.
eval has evaluation functions which capture EM and F1 scores.
config manages hyperparameters
models consists of the Hyperbolic T5 and Soft Prompt Model. Soft Prompt Model takes a Knit5 Model and a Soft Prompt.

load_c4_dataset

Loads the C4 dataset. With --files you can specify how many files should be downloaded since we don't want the complete dataset.

train_knowledge_integration

Training script for the Knowledge Integration part. Hyperparameters and model type can be adjusted in the config. For example for MetaQA:

python train_knowledge_integration.py \
    --dataset metaqa \
    --epochs 50 \
    --checkpoint_save_path checkpoints/metaqa/knowledge_integration/ \
    --tboard_logs_save_path tboard_logs/metaqa/knowledge_integration/ \
    --batch_size 64 \
    --learning_rate 0.001

train_random_walk

Training script for the Random Walk part. Hyperparameters and model type can be adjusted in the config. For example for MetaQA:

python train_random_walk.py \
    --additional_layer hyperbolic \
    --learning_rate 0.3 \
    --dataset metaqa \
    --batch_size 64 \
    --epochs 100 \
    --curvature 0.1 \
    --knit5_checkpoint_path Path/To/Knowledge_Integrated/Model/Of/MetaQA \
    --checkpoint_save_path Path/For/Checkpoints \
    --tboard_logs_save_path Path/For/Tensorboard_Logs

train_parse_then_hop

Training script for the Parsing part. Hyperparameters and model type can be adjusted in the config. For example for MetaQA:

python train_parse_then_hop.py \
    --additional_layer hyperbolic \
    --learning_rate 0.3 \
    --curvature 0.37 \
    --dataset metaqa \
    --epochs 50 \
    --batch_size 64 \
    --knit5_checkpoint_path Path/To/Knowledge_Integrated/Model/Of/MetaQA \
    --checkpoint_save_path Path/For/Checkpoints \
    --tboard_logs_save_path Path/For/Tensorboard_Logs

test_parse_then_hop

Testing Script for Parse Then Hop Method.

python test_parse_then_hop.py \
    --additional_layer_parse hyperbolic \
    --additional_layer_hop hyperbolic \
    --dataset metaqa \
    --batch_size 64 \
    --knit5_checkpoint_path Path/To/Knowledge_Integrated/Model \
    --parsing_prompt_checkpoint_path Path/To/Parsing_Prompt \
    --hopping_prompt_checkpoint_path Path/To/Hopping_Prompt

compute_delta_hyperbolicity

Functions to Compute Delta Hyperbolicity and Curvature for each Dataset.

compute_distance_accuracy

Functions to Compute the Distance Accuracy for Hyperbolic and Euclidean Embeddings.

How to get started

Install the c4 dataset by using the ./load_c4_dataset.sh --files 5
Download and Preprocess the Dataset as described above
Use the train_knowledge_integration.py script to finetune your model and save the Knowledge Integrated Model
Finetune the Soft Prompts (and optionally the additional layer) with the train_random_walk.py and train_parse_then_hop.py
You can adjust parameters and save path in the config. Some hyperparameters you can also adjust in the arguments of the python scripts like the batch size, number of epochs, learning rate, which additional layer to choose and the initial curvature.

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
src		src
.gitignore		.gitignore
README.md		README.md
compute_delta_hyperbolicity.py		compute_delta_hyperbolicity.py
compute_distance_accuracy.py		compute_distance_accuracy.py
load_c4_dataset.py		load_c4_dataset.py
load_c4_dataset.sh		load_c4_dataset.sh
metaqa_preprocess.py		metaqa_preprocess.py
mlpq_preprocess.py		mlpq_preprocess.py
test_parse_then_hop.py		test_parse_then_hop.py
test_parsing.py		test_parsing.py
test_random_walk.py		test_random_walk.py
train_knowledge_integration.py		train_knowledge_integration.py
train_parse_then_hop.py		train_parse_then_hop.py
train_random_walk.py		train_random_walk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing Multi-Hop Reasoning with Hyperbolic Representations

Datasets:

2WikiMultiHopQA

MetaQA

MLPQ

PQ

Content

src

load_c4_dataset

train_knowledge_integration

train_random_walk

train_parse_then_hop

test_parse_then_hop

compute_delta_hyperbolicity

compute_distance_accuracy

How to get started

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

caisa-lab/HyperbolicMultiHopReasoning

Folders and files

Latest commit

History

Repository files navigation

Enhancing Multi-Hop Reasoning with Hyperbolic Representations

Datasets:

2WikiMultiHopQA

MetaQA

MLPQ

PQ

Content

How to get started

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages