- NLTK only supports ascii characters so any others should be converted/stripped. - NLTK might already have something to do this