GitHub - liyaooi/TAMO: TAMO: reimagine Table representation as an independent Modality for LLMs

TAMO: Table as a Modality for Large Language Models

About our Work

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short on coping with tabular data. More specifically, the current scheme often simply replies on serializing the tabular data, together with the meta information, then put them through the LLMs. We argue that the loss of the structural information and incomplete cell values persisted are the root of this shortcoming. In this work, we further propose TAMO (reimagine Table representation as an independent Modality) that bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulted model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvement on generalization with an average relative gain by 42.65%.

Datasets

To evaluate the effectiveness of TAMO, we conduct extensive experiments on our proposed table structure understanding dataset StructQA and four public table reasoning benchmarks (HiTab, WikiTQ, WikiSQL, and FetaQA).
The StructQA dataset is located in the "dataset/structProbe" folder. In the early stages of the experiment, we named it structProbe, signifying that this is a probe testing LLM's understanding of table structures. For details on its construction, please refer to the paper.

How to Run

Install Requirements.

conda create --name TAMO python=3.9
conda activate TAMO
bash requirements.sh

Download Datasets and Models
Datasets: The code for downloading the Hugging Face version of the four public table reasoning benchmark datasets (HiTab, WikiTQ, WikiSQL, and FetaQA) is located in the file "download_dataset.py".
Models: The script for downloading all the models used is contained within the "download_models.sh" file.
Data Preprocess
Run the "./script/data_preprocesss.sh" script to execute the data preprocess.
Run Experiment
All the experimental run scripts are located in the "./script" folder.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset/structProbe/structProbe		dataset/structProbe/structProbe
script		script
src		src
.gitignore		.gitignore
README.md		README.md
download_dataset.py		download_dataset.py
download_models.sh		download_models.sh
inference.py		inference.py
requirements.sh		requirements.sh
table_train.py		table_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAMO: Table as a Modality for Large Language Models

About our Work

Datasets

How to Run

About

Uh oh!

Releases

Packages

Languages

liyaooi/TAMO

Folders and files

Latest commit

History

Repository files navigation

TAMO: Table as a Modality for Large Language Models

About our Work

Datasets

How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages