Skip to content

Conversation

@jayshrivastava
Copy link
Collaborator

@jayshrivastava jayshrivastava commented Nov 16, 2025

This change adds an arrow flight sql server which can be targeted by SQLancer. It generates in-memory tables using DML.

These are the results

  • Ran 3k queries
  • 73% success rate (ie. 27% of queries ran into errors)
    • These can be ignored because sqlancer can produce invalid queries
  • Importantly, did not generate any logs inlogs/datafusion_custom_log/error_report.log, meaning all validations passed
➜  target git:(main) java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar sqlancer-*.jar --random-seed 0 --num-threads 1 --max-generated-databases 1 --num-tries 10 --num-queries 500 datafusion
WARNING: Unknown module: org.apache.arrow.memory.core specified to --add-opens
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by com.github.benmanes.caffeine.cache.UnsafeAccess (file:/Users/jayant.shrivastava/code/datafusion-sqllancer/target/lib/caffeine-2.9.3.jar)
WARNING: Please consider reporting this to the maintainers of class com.github.benmanes.caffeine.cache.UnsafeAccess
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release
Nov 16, 2025 4:27:36 PM org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.BaseAllocator <clinit>
INFO: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
Nov 16, 2025 4:27:36 PM org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.DefaultAllocationManagerOption getDefaultAllocationManagerFactory
INFO: allocation manager type not specified, using netty as the default type
Nov 16, 2025 4:27:36 PM org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.CheckAllocator reportResult
INFO: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class
[2025/11/16 16:27:41] Executed 296 queries (59 queries/s; 0.40/s dbs, successful statements: 66%). Threads shut down: 1.
[2025/11/16 16:27:46] Executed 804 queries (101 queries/s; 0.20/s dbs, successful statements: 70%). Threads shut down: 2.
[2025/11/16 16:27:51] Executed 1389 queries (117 queries/s; 0.40/s dbs, successful statements: 73%). Threads shut down: 4.
[2025/11/16 16:27:56] Executed 1969 queries (116 queries/s; 0.20/s dbs, successful statements: 74%). Threads shut down: 5.
[2025/11/16 16:28:01] Executed 2510 queries (108 queries/s; 0.40/s dbs, successful statements: 73%). Threads shut down: 7.
[2025/11/16 16:28:06] Executed 3109 queries (119 queries/s; 0.40/s dbs, successful statements: 73%). Threads shut down: 9.

➜  target git:(main) cat logs/datafusion_custom_log/error_report.log

Example DMLs

➜  target git:(main) cat logs/datafusion/database0-cur.log | grep DML
/*DML*/CREATE TABLE t0(v0 BIGINT, v1 STRING, v2 BOOLEAN, v3 DOUBLE);
/*DML*/INSERT INTO t0(v3, v2, v0) VALUES ('+Inf'::Double, true, -81839914);
/*DML*/INSERT INTO t0(v0) VALUES (-81839914);
/*DML*/INSERT INTO t0(v1, v0, v2) VALUES ('-81839914', -81839914, false);
/*DML*/INSERT INTO t0(v0, v1, v3) VALUES (-81839914, '', 0.9620689182075386), (412901507, '', '-Inf'::Double);
/*DML*/INSERT INTO t0(v0, v2) VALUES (43703267, false);
/*DML*/INSERT INTO t0(v2) VALUES (false);
/*DML*/INSERT INTO t0(v2, v3, v1) VALUES (false, 0.3227677982432213, '2v');
/*DML*/INSERT INTO t0(v3) VALUES (0.6808360271610692);
/*DML*/INSERT INTO t0(v3) VALUES ('NaN'::Double);
/*DML*/INSERT INTO t0(v0, v2) VALUES (3, true), (-4, true);
/*DML*/CREATE TABLE t0_stringview AS SELECT v0, arrow_cast(v1, 'Utf8View') as v1, v2, v3 FROM t0;

Example queries

➜  target git:(main) tail logs/datafusion/database0-cur.log
select * from tt;
SELECT ALL LAST_VALUE(SQRT(tt0.v0)) OVER (PARTITION BY tt0.v2, tt0.v1) FROM t0_stringview AS tt0 WHERE tt0.v2;
SELECT tt1.v3, tt0.v3, ARROW_CAST(tt1.v1, 'Utf8'), tt1.v0, ARROW_CAST(tt0.v1, 'Utf8'), tt0.v2 FROM t0_stringview AS tt0 FULL OUTER JOIN t0_stringview AS tt1 ON false WHERE ((TRANSLATE(REGEXP_MATCH(tt0.v1, tt0.v1, tt1.v0), tt1.v1, ((tt1.v1)~*(tt0.v1))))<(((REGEXP_REPLACE(tt1.v1, tt0.v1))!~(((tt0.v1)!~~(tt1.v1))))));
SELECT COUNT(*) FROM t0_stringview AS tt0 WHERE ((false)OR(STARTS_WITH((('!喐') ILIKE ('24\n')), LTRIM(1))));
SELECT tt0.v0, tt1.v3, tt0.v2, tt0.v3, ARROW_CAST(tt1.v1, 'Utf8'), tt1.v2 FROM t0_stringview AS tt0 FULL OUTER JOIN t0 AS tt1 ON tt0.v2 GROUP BY FIND_IN_SET('Zm', tt1.v3) ORDER BY tt0.v0 ASC;
select * from tt;
SELECT ALL 1.7323877647274428E308 OVER () FROM t0_stringview AS tt0 WHERE ((0.48131556370595674)IS DISTINCT FROM(CEIL(ABS(tt0.v2))));
SELECT DISTINCT ((((-4)|(-5)))<<((- -2.091572074E9))) FROM t0_stringview AS tt0 WHERE false GROUP BY '2046108624', -1;
SELECT COUNT(*) FROM t0_stringview AS tt0 INNER JOIN t0 AS tt1 ON ((SUBSTR(tt1.v3, tt1.v0))=(tt0.v1)) WHERE ((((tt1.v1)!~~(RTRIM(tt0.v1))))<(((tt0.v1)||(tt0.v1))));
SELECT DISTINCT tt1.v3, tt1.v0 FROM t0 AS tt0 CROSS JOIN t0 AS tt1 GROUP BY false, (NOT (((('oV')>('-81839914')))IS DISTINCT FROM(false)));

Example log file:
database0-cur.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants