-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
After I execute
cargo run --release -- generate --benchmark tpcds \
--scale 1000 \
--partitions 48 \
--generator-path /path/to/DSGen-software-code-3.2.0rc1/tools \
--output /tmp/tpcds/sf1000/
The data are generated in folder /tmp/tpcds/sf1000/. Then I execute
mkdir /tmp/tpcds/sf1000-parquet
cargo run --release -- convert --benchmark tpcds \
--input /tmp/tpcds/sf1000/
--output /tmp/tpcds/sf1000-parquet/
I got error below
ArrowError(CsvError("incorrect number of fields for line 1, expected 31 got more than 31"))
I found the code cause the error might be
df.write_parquet(&output_filename, Some(props)).await?;
in lib.rs
After I delete the first number in call_center.dat/part-1.dat, the error became to
ArrowError(CsvError("incorrect number of fields for line 2, expected 31 got 32"))
However the process of TPCH data is OK. The generators of TPCH and TPC-DS are obtained as you described in your repo.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels