This repo contains scripts to train NLP models using the text data.
- pytorch
- numpy
- nltk.tokenize
glove.py contains a GloVe model written in pytorch. dataset.py contains a Dataset class - it is written in a way so that torch.utils.data.DataLoader utility class of pytorch can be used for training.
$ python3 glove.py --input wiki_data.txt --batch_size 512Trained word vectors are available on the releases page.
Let's check if the closest words make sense.
$ python3 test_word_vectors.py --word IRA
roth, iras, sep, 401, contribute
$ python3 test_word_vectors.py --word option
call, options, put, exercise, underlying
$ python3 test_word_vectors.py --word stock
shares, share, market, stocks, priceThis CPU-only implementation is not yet optimized. For training on CPU, it might be best to download the Glove software from here.
- GloVe Paper
- TorchGlove repo
MIT