-
Notifications
You must be signed in to change notification settings - Fork 999
Fix csv diff compared to Spark CPU #20907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chong Gao <res_life@163.com> Add test case Signed-off-by: Chong Gao <res_life@163.com>
|
/ok to test |
@res-life, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test b76068c |
|
It's draft, not sure if this PR will fails other cases. |
|
pre-commit.ci autofix |
|
Hi @vuule , we tried to fix the csv diff issue by Cursor. We tried on the cases, it seems working. But we don't have confidence about other cases. Can you review this PR to feedback if it makes sense or not? Thanks. |
|
/ok to test dc4c828 |
|
I'll take a look, thanks for the PR! |
|
One of the new tests fails with the changes in this PR. So the fix here is about as far as I got on my end. Looking into fixing the second test case. |
|
The new test is failing for me |
|
/ok to test 0098d49 |
|
The core issue seems to be that cuDF always applies some string processing steps that should only be done when the string is actually quoted. I think I have a fix (all tests pass), but it seems to lower the performance quite a bit in some cases. |
|
@vuule Thanks very much for helping this issue which is from a customer. |
|
I'm fine with pushing to this PR, no need to close |
Add test case
Description
Fix csv diff compared to Spark CPU
closes #20812
Checklist