A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.
git clone git@github.com:Saurav0074/morph_analyzer.git
cd morph_analyzer
The file main.py takes the following command-line arguments:
| Argument | Values | Required | Specification |
|---|---|---|---|
| lang | hindi, urdu | Yes | Language |
| mode | train, test and predict (i.e., no gold labels required) | Yes | Training, testing and predictions. |
| phonetic | True/1/yes/y/t and False/0/no/n/f | No (default=False) |
Use MOO-driven phonological features or not. |
| freezing | " " and " " | No (default=False) |
Use progressive freezing for training or not (see FreezeOut). |
train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.
Training:
>>> python main.py --lang urdu --mode train --phonetic true --freezing true #trainTesting:
>>> python main.py --lang urdu --mode test --phonetic true --freezing true #testPredicting:
>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predictFor prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.
For the test mode:
- the predicted roots and features as well as their gold-labelled counterparts are written to separate files within
output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt. - Micro-averaged precision-recall graphs are stored in
graph_outputs/[lang]/.
For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.
Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:
If this repo was helpful in your research, consider citing our work:
@article{jha2018multi,
title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
journal={arXiv preprint arXiv:1811.08619},
year={2018}
}

