Skip to content

Receiving UnicodeDecodeError on CSV file #86

@PJAMESR

Description

@PJAMESR

Hello, so I am trying to go througha huge data file using a machine learning software that requires the text to be in Unicode. I have been working on this problem for a number of hours and it is quite urgent. Here is my code:

TEST_SENTENCES = []
with open('Book2.csv', 'rb') as csvfile:
    reader = unicodecsv.DictReader(csvfile)
    for row in reader:
        TEST_SENTENCES.append(row["Tweet"])
    for x in [TEST_SENTENCES]:
        codecs.encode(x, 'utf-8')

Here is the error I am receiving:

Traceback (most recent call last):
  File "C:\Users\pjame\Desktop\DeepMoji-master\examples\score_texts_emojis.py", line 25, in <module>
    for row in reader:
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 217, in next
    row = csv.DictReader.next(self)
  File "C:\Python27\lib\csv.py", line 108, in next
    row = self.reader.next()
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 128, in next
    for value in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 254: unexpected end of data

The error is always the same, and is always in the same position no matter what data I use. It seems like it could be a problem with the copying and pasting of large amounts of data into the spreadsheet, but I am not sure.
Does anyone have any idea how I can get around this error? I see that for some reason csv is being called in one of the error messages, but I am not sure why. It could be in the parts of the file I did not write which can be found here: (https://github.com/bfelbo/DeepMoji/blob/master/examples/score_texts_emojis.py)

If anyone has any ideas, or fixes to this, I would be so so appreciative. Even just being able to find and delete the lines in the CSV file that are errors would be super useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions