MLM: Multiple Languages and Modalities

About

Multiple Languages and Modalities (MLM) is a dataset consisting of text in three languages (EN, FR, DE), images, location data, and triple classes. The resource is designed to evaluate the strengths of multitask learning systems in generalising on diverse data. The paper defines a benchmark evaluation consisting of the following tasks:

Cross-modal retrieval
Location estimation

Additional details on the resource and benchmark evaluation are available at the MLM website: http://cleopatra.ijs.si/goal-mlm/ IR+LE is an architecture for a multitask learning system designed as a baseline for the above benchmark. The pipeline for cross-modal retrieval extends an approach proposed by Marin et al: http://im2recipe.csail.mit.edu/im2recipe-journal.pdf.

Multitask IR+LE Framework

IR+LE System and MLM Dataset

Requirements and Setup

Python version >= 3.7

PyTorch version >= 1.4.0

# clone the repository
git clone https://github.com/GOALCLEOPATRA/MLM.git
cd MLM
pip install -r requirements.txt

Download MLM dataset

Download the dataset hdf5 files from here and place them under the data folder.

Train tasks

Multitask Learning (IR + LE)

python train.py --task mtl

Cross-modal retrieval task

python train.py --task ir

Location estimation task

python train.py --task le

For setting other arguments (e.g. epochs, batch size, dropout), please check args.py.

Test tasks

Multi-task Learning (IR + LE)

python test.py --task mtl

Cross-modal retrieval task

python test.py --task ir

Location estimation task

python test.py --task le

All logs and checkpoints will be saved under the experiments folder.

License

The repository is under MIT License.

Cite

@inproceedings{armitage2020mlm,
  title={Mlm: a benchmark dataset for multitask learning with multiple languages and modalities},
  author={Armitage, Jason and Kacupaj, Endri and Tahmasebzadeh, Golsa and Maleshkova, Maria and Ewerth, Ralph and Lehmann, Jens},
  booktitle={Proceedings of the 29th ACM International Conference on Information \& Knowledge Management},
  pages={2967--2974},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
data		data
experiments		experiments
models		models
sandbox		sandbox
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
args.py		args.py
ir+le.png		ir+le.png
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLM: Multiple Languages and Modalities

About

IR+LE System and MLM Dataset

Requirements and Setup

Download MLM dataset

Train tasks

Test tasks

License

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

GOALCLEOPATRA/MLM

Folders and files

Latest commit

History

Repository files navigation

MLM: Multiple Languages and Modalities

About

IR+LE System and MLM Dataset

Requirements and Setup

Download MLM dataset

Train tasks

Test tasks

License

Cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages