Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 522 Bytes

File metadata and controls

6 lines (5 loc) · 522 Bytes

pipeline

An example on how to preprocess textual content using NLTK with Python or PySpark. Code employed in the following technical report, in which different tools and resources to preprocess textual content are compared:

Diaz, A. K. R., de Lima, A. P., Silva, A. M., da Silva Costa, F. H., Pagnossim, J. L. M., & Peres, S. M. (2018). Relatorio Técnico PPgSI-001/2018 Uma análise comparativa das ferramentas de pré-processamento de dados textuais: NLTK, PreTexT e R. (available here: https://tinyurl.com/y2tt7j2o)