A Python library to extract keywords from text ,filtering by POS tag and lemmatizing them.
- Install TreeTagger;
- Edit your
~/.bashrcfile, adding thetreetagger/cmdpath:
nano ~/.bashrc
export TREETAGGER_HOME='/path/to/your/TreeTagger/cmd/'- Install NLTK and its data:
sudo pip install nltk
sudo python -c "import nltk; nltk.download('punkt')"
sudo python -c "import nltk; nltk.download('stopwords')"A simple use of the library
from keyword_extractor import cleaner, hapax, freqlist, freqplot
List = open("MOTORI.txt").readlines()
doc_clean=[]
#Extract keywords
for doc in List:
doc_clean.extend(cleaner(doc))
#Remove Hapax
myhapaxlist=hapax(doc_clean)
#Filter freqlist
myfreqlist=freqlist(myhapaxlist)
#Print plot
freqplot(myhapaxlist)