add helper function for unicode delimiter and quotechar #60

artwr · 2015-08-11T18:04:41Z

@jdunck
I ran into issue #36, and I was wondering if we could just encode the parameters using the encoding provided. Of course, the encoding has to be valid for those characters to be recognized as single characters, but it passed the doctests. Note that I have to encode them before calling the init methods on the DictWriter and DictReader because it errors out otherwise.

Here is an example of a test that fails in master but passes in this branch. I was not sure where to provide it.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from unicodecsv import DictReader, DictWriter
from cStringIO import StringIO

d = u'à'

f = StringIO()
w = DictWriter(f, delimiter=d, fieldnames=['name', 'place'], encoding='latin1')
w.writerow({'name': 'Cary Grant', 'place': 'hollywood'})
w.writerow({'name': 'Nathan Brillstone', 'place': u'øLand'})
w.writerow({'name': u'Will ø. Unicoder', 'place': u'éSpandland'})
f.seek(0)
r1 = DictReader(f, fieldnames=['name', 'place'], encoding='latin1', delimiter=d)
print r1.next() == {'name': 'Cary Grant', 'place': 'hollywood'}
print r1.next() == {'name': 'Nathan Brillstone', 'place': u'øLand'}
print r1.next() == {'name': u'Will ø. Unicoder', 'place': u'éSpandland'}

Let me know what you think.

jdunck · 2015-09-21T04:16:57Z

I'm honestly happy to see these kinds of issues -- people are starting to work with unicode literals more, which means I actually need to write some docs. ;-)

I'd prefer to change the module to expect the native string (bytestring under py2, str under py3), and document that. Then it would be pretty simple to raise a helpful error when misused.

Do you agree that this would address the problem?

artwr · 2015-09-21T19:51:56Z

Sounds like a very reasonable solution.

Our use case for python-unicodecsv was trying to make another packages Py 2/3 compatible. In this case the from future import unicode_literals makes the default py2 string behave like unicode, and this would conflict with what you are proposing...
One could recommend a call to str before passing the parameter (As recommended here : #36 (comment)), but again, this does not necessarily propagate well. Maybe using some of the python future helpers as documented here: http://python-future.org/unicode_literals.html.

I am honestly not completely sure about what the best approach would be. Any thoughts on the mix of future and python-unicodecsv? It was my understanding that the csv library in Python 3 can handle unicode delimiters, and therefore that importing python-unicodecsv was only useful in Python 2...

Best,
Arthur

jdunck · 2015-09-21T20:23:47Z

Well, surprisingly people are using unicodecsv under both 2 and 3 and expecting it to work transparently. My intention for the library was just to make CSVs less painful under 2, and I expected people to use the stdlib csv when running under 3. See, for example, #58.

I think long-term I'd like to see unicodecsv retired, because csv under py3 "just works", but I see the utility of a compatibility library for code which needs to work under 2 and 3.

OK, I'm reversing my opinion here - let's support your use case, but with a different implementation. (Format keyword arguments are used to override dialects, but dialects, too, have potentially-unicode quotechar and delimiter.)

   dialect, fmtparams = encode_arguments(dialect, fmtparams, encoding, errors)

Would you mind updating the PR to support the dialect attributes?

A test case would be nice, as well, thanks.

add helper function for unicode delimiter and quotechar

a1bbc2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add helper function for unicode delimiter and quotechar #60

add helper function for unicode delimiter and quotechar #60

Uh oh!

artwr commented Aug 11, 2015

Uh oh!

jdunck commented Sep 21, 2015

Uh oh!

artwr commented Sep 21, 2015

Uh oh!

jdunck commented Sep 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add helper function for unicode delimiter and quotechar #60

Are you sure you want to change the base?

add helper function for unicode delimiter and quotechar #60

Uh oh!

Conversation

artwr commented Aug 11, 2015

Uh oh!

jdunck commented Sep 21, 2015

Uh oh!

artwr commented Sep 21, 2015

Uh oh!

jdunck commented Sep 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants