executemany has an option for 'bulk'-insert on Impala#96
executemany has an option for 'bulk'-insert on Impala#96nonsleepr wants to merge 3 commits intocloudera:masterfrom
Conversation
|
I'm not sure I understand what this does. Could you explain it in a bit more detail? |
|
The idea here is to INSERT data in big chunks instead of doing it row-by-row. |
|
Ah, I was unfamiliar with That said, I'm generally uneasy about having impyla rewrite people's queries, and anyway, using an |
|
In my project, I'm using impyla as one of several database drivers which could be accessed via DB API 2.0. Users should be able to upload/insert small datasets/tables (probably 100 rows max) into the database. Right now this will produce hundreds of files in HDFS while this PR allows to avoid it. According to PEP 249:
Impala docs also give following recommendations on
I didn't have a chance to look at ibis until now. Interestingly it uses impyla and hdfs (which my project is based on) and bunch of other packages underneath. One of the goals of my project is to make it lean and preferably pure (and that's #91 is for). |
|
Ok, as long as the default is the same, shouldn't be a problem. I'll make some additional comments for changes as well. |
There was a problem hiding this comment.
Nit: insert addl line per PEP8
|
@laserson Can you merge this change into master? |
|
I'm no longer involved with this project. Try a more recent committer. |
|
Sorry this got neglected. This is interesting but I'm unsure if the interface is quite right. Is there a reason not to do the rewrite transparently? |
|
Hi I achieved the same thing by this #460 and it is working well. Using allToBeInserted.to_sql('xx', engine, if_exists='append', index=False,chunksize=2000,method='multi') |
Impala supports multiple row inserts. This pull request adds option to use this feature.