Receiving UnicodeDecodeError on CSV file

Hello, so I am trying to go througha  huge data file using a machine learning software that requires the text to be in Unicode. I have been working on this problem for a number of hours and it is quite urgent. Here is my code:
```
TEST_SENTENCES = []
with open('Book2.csv', 'rb') as csvfile:
    reader = unicodecsv.DictReader(csvfile)
    for row in reader:
        TEST_SENTENCES.append(row["Tweet"])
    for x in [TEST_SENTENCES]:
        codecs.encode(x, 'utf-8')
```

Here is the error I am receiving:

```
Traceback (most recent call last):
  File "C:\Users\pjame\Desktop\DeepMoji-master\examples\score_texts_emojis.py", line 25, in <module>
    for row in reader:
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 217, in next
    row = csv.DictReader.next(self)
  File "C:\Python27\lib\csv.py", line 108, in next
    row = self.reader.next()
  File "C:\Python27\lib\site-packages\unicodecsv\py2.py", line 128, in next
    for value in row]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 254: unexpected end of data
```

The error is always the same, and is always in the same position no matter what data I use. It seems like it could be a problem with the copying and pasting of large amounts of data into the spreadsheet, but I am not sure. 
Does anyone have any idea how I can get around this error? I see that for some reason csv is being called in one of the error messages, but I am not sure why. It could be in the parts of the file I did not write which can be found here: (https://github.com/bfelbo/DeepMoji/blob/master/examples/score_texts_emojis.py)

If anyone has any ideas, or fixes to this, I would be so so appreciative. Even just being able to find and delete the lines in the CSV file that are errors would be super useful. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Receiving UnicodeDecodeError on CSV file #86

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Receiving UnicodeDecodeError on CSV file #86

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions