-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Hello, so I am trying to go througha huge data file using a machine learning software that requires the text to be in Unicode. I have been working on this problem for a number of hours and it is quite urgent. Here is my code:
TEST_SENTENCES = []
with open('Book2.csv', 'rb') as csvfile:
reader = unicodecsv.DictReader(csvfile)
for row in reader:
TEST_SENTENCES.append(row["Tweet"])
for x in [TEST_SENTENCES]:
codecs.encode(x, 'utf-8')
Here is the error I am receiving:
Traceback (most recent call last):
File "C:\Users\pjame\Desktop\DeepMoji-master\examples\score_texts_emojis.py", line 25, in <module>
for row in reader:
File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 217, in next
row = csv.DictReader.next(self)
File "C:\Python27\lib\csv.py", line 108, in next
row = self.reader.next()
File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 128, in next
for value in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 254: unexpected end of data
The error is always the same, and is always in the same position no matter what data I use. It seems like it could be a problem with the copying and pasting of large amounts of data into the spreadsheet, but I am not sure.
Does anyone have any idea how I can get around this error? I see that for some reason csv is being called in one of the error messages, but I am not sure why. It could be in the parts of the file I did not write which can be found here: (https://github.com/bfelbo/DeepMoji/blob/master/examples/score_texts_emojis.py)
If anyone has any ideas, or fixes to this, I would be so so appreciative. Even just being able to find and delete the lines in the CSV file that are errors would be super useful.