Skip to content

Conversation

@artwr
Copy link

@artwr artwr commented Aug 11, 2015

@jdunck
I ran into issue #36, and I was wondering if we could just encode the parameters using the encoding provided. Of course, the encoding has to be valid for those characters to be recognized as single characters, but it passed the doctests. Note that I have to encode them before calling the init methods on the DictWriter and DictReader because it errors out otherwise.

Here is an example of a test that fails in master but passes in this branch. I was not sure where to provide it.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from unicodecsv import DictReader, DictWriter
from cStringIO import StringIO

d = u'à'

f = StringIO()
w = DictWriter(f, delimiter=d, fieldnames=['name', 'place'], encoding='latin1')
w.writerow({'name': 'Cary Grant', 'place': 'hollywood'})
w.writerow({'name': 'Nathan Brillstone', 'place': u'øLand'})
w.writerow({'name': u'Will ø. Unicoder', 'place': u'éSpandland'})
f.seek(0)
r1 = DictReader(f, fieldnames=['name', 'place'], encoding='latin1', delimiter=d)
print r1.next() == {'name': 'Cary Grant', 'place': 'hollywood'}
print r1.next() == {'name': 'Nathan Brillstone', 'place': u'øLand'}
print r1.next() == {'name': u'Will ø. Unicoder', 'place': u'éSpandland'}

Let me know what you think.

@jdunck
Copy link
Owner

jdunck commented Sep 21, 2015

I'm honestly happy to see these kinds of issues -- people are starting to work with unicode literals more, which means I actually need to write some docs. ;-)

I'd prefer to change the module to expect the native string (bytestring under py2, str under py3), and document that. Then it would be pretty simple to raise a helpful error when misused.

Do you agree that this would address the problem?

@artwr
Copy link
Author

artwr commented Sep 21, 2015

Sounds like a very reasonable solution.

Our use case for python-unicodecsv was trying to make another packages Py 2/3 compatible. In this case the from future import unicode_literals makes the default py2 string behave like unicode, and this would conflict with what you are proposing...
One could recommend a call to str before passing the parameter (As recommended here : #36 (comment)), but again, this does not necessarily propagate well. Maybe using some of the python future helpers as documented here: http://python-future.org/unicode_literals.html.

I am honestly not completely sure about what the best approach would be. Any thoughts on the mix of future and python-unicodecsv? It was my understanding that the csv library in Python 3 can handle unicode delimiters, and therefore that importing python-unicodecsv was only useful in Python 2...

Best,
Arthur

@jdunck
Copy link
Owner

jdunck commented Sep 21, 2015

Well, surprisingly people are using unicodecsv under both 2 and 3 and expecting it to work transparently. My intention for the library was just to make CSVs less painful under 2, and I expected people to use the stdlib csv when running under 3. See, for example, #58.

I think long-term I'd like to see unicodecsv retired, because csv under py3 "just works", but I see the utility of a compatibility library for code which needs to work under 2 and 3.

OK, I'm reversing my opinion here - let's support your use case, but with a different implementation. (Format keyword arguments are used to override dialects, but dialects, too, have potentially-unicode quotechar and delimiter.)

   dialect, fmtparams = encode_arguments(dialect, fmtparams, encoding, errors)

Would you mind updating the PR to support the dialect attributes?

A test case would be nice, as well, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants