Collection of stopwords, frequent words and other things.
To help a build application with NLP (Natural Language Processing) like:
- Stemming
- Text simplification
- Text-to-speech
- Text-proofing
- Natural language search
- Query expansion
- Automated essay scoring
- Truecasing
or Search Engines like:
| Language ISO 639-1 | Name | Stopwords | Frequent Words | Obs |
|---|---|---|---|---|
| bg | Bulgarian | Yes | No | UTF-8 |
| cz | Czech | Yes | No | UTF-8 |
| de | German | Yes | Yes | |
| en | English | Yes | Yes | |
| es | Spanish | Yes + | Yes | |
| fi | Finnish | Yes | Yes | |
| fr | French | Yes | Yes | |
| hu | Hungarian | Yes | No | UTF-8 |
| it | Italian | Yes | Yes | UTF-8 |
| pl | Polish | Yes | No | UTF-8 |
| pt | Portuguese | Yes + | No | |
| ru | Russian | Yes | No | UTF-8 |
| sv | Swedish | Yes | Yes |
Almost everything was extract from http://members.unine.ch/jacques.savoy/clef/
Make a fork, do your changes and request a pull.
Please, also do the modifications on this readme file!
Thanks for your help!