Skip to content

Programs to find document similarity using word embeddings and cosine similarity. Even checks for plagiarism with a local corpus.

Notifications You must be signed in to change notification settings

Reginasabs/NLP-DocSimilarity

Repository files navigation

These are programs that find document similarity using word embeddings and cosine similarity. TF-IDF, co-occurence amtrix, Word2Vec, Fasttext and GloVe are used for obtaining word embeddings. The repository also contains a program to check the plagiarism of a pdf file against a local corpus. Programs that use GloVe need the glove.6B.50d.txt file downloaded in the working directory (not provided in the repository) Programs of basic level contains programs that check similarity of small sentences. Progressing to the medium lvel, there are programs that check similarity of 2 pdf files. In the advanced level, there are programs that check the similarity of documents in the 20newsgroups dataset.

About

Programs to find document similarity using word embeddings and cosine similarity. Even checks for plagiarism with a local corpus.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published