feat: add audit-comet-expression Claude Code skill by andygrove · Pull Request #3793 · apache/datafusion-comet

andygrove · 2026-03-25T19:53:07Z

Summary

Adds a new Claude Code skill .claude/skills/audit-comet-expression/SKILL.md for auditing existing Comet expression implementations for correctness and test coverage.

The skill (/audit-comet-expression <expression-name>) performs a structured audit:

Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, cloning each tag to /tmp/ and diffing behavioral changes across versions
Reviews the Comet Scala serde, shims, and Rust/DataFusion implementation for correctness (null handling, type dispatch, ANSI mode, getSupportLevel accuracy)
Inventories existing Comet tests: SQL file tests and Scala tests
Produces a coverage gap matrix against Spark's own test suite (nulls, boundary values, NaN, multibyte UTF-8, ANSI mode, all input types, Parquet dictionary encoding, cross-version differences)
Flags implementation gaps such as missing shims or incorrect Incompatible/Unsupported markings
Offers to implement the missing tests (preferring the SQL file test framework)

This complements the existing review-comet-pr skill, which focuses on reviewing incoming PRs. The audit skill is for proactively checking the quality of expressions already in the codebase.

martin-g · 2026-04-01T19:55:21Z

.claude/skills/audit-comet-expression/SKILL.md

+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5


What is the purpose of this ?
It just prints version = "..."

martin-g · 2026-04-01T19:59:40Z

.claude/skills/audit-comet-expression/SKILL.md

+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | head -10


This looks too broad.
It greps in all crates you have cargo fetched for any of your local builds.

Idea: the Bash snippet could check for existence of some predefined env var, e.g. $DATAFUSION_SRC and grep inside it. Every developer will have to export this env var in his/her shell.

martin-g · 2026-04-01T20:02:28Z

.claude/skills/audit-comet-expression/SKILL.md

+
+## Step 6: Recommendations
+
+Summarize findings as a prioritized list:


The colon at at end of the sentence suggests that something follows

martin-g · 2026-04-01T20:13:58Z

.claude/skills/audit-comet-expression/SKILL.md

+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ --include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ --include="*.scala" -l | grep -v test


What is the purpose of | grep -v test at the end ?
It greps in src/main, so tests are not expected there. Also I would expect -i or Test
Currently it would ignore any file/folder containing latest, for example.

martin-g · 2026-04-01T20:19:53Z

.claude/skills/audit-comet-expression/SKILL.md

+Clone specific Spark version tags (use shallow clones to avoid polluting the workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do


I have never written a Claude skill before...
Does it need some kind of error handling ?
E.g. if git clone ... fails then the rest of the scripts should not be executed.

Maybe add set -eu -o pipefail in the beginning ?!

martin-g · 2026-04-01T20:25:19Z

.claude/skills/audit-comet-expression/SKILL.md

+After implementing tests, tell the user how to run them:
+
+```bash
+./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none


Shouldn't $ARGUMENTS be passed as -DwildcardSuites="$ARGUMENTS" ?

Suggested change

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -DwildcardSuites="$ARGUMENTS" -Dtest=none

kazuyukitanimura

Thanks @andygrove

I did not try this skill by myself but I listed some edge cases below

kazuyukitanimura · 2026-04-01T22:09:29Z

.claude/skills/audit-comet-expression/SKILL.md

+| Empty string / empty array / empty map                  |                |                |                  |      |
+| Zero, negative values (numeric)                         |                |                |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |                |                  |      |
+| NaN, Infinity, -Infinity (float/double)                 |                |                |                  |      |


Would you add negative zero as well as subnormal float/double?

kazuyukitanimura · 2026-04-01T22:13:53Z

.claude/skills/audit-comet-expression/SKILL.md

+| Zero, negative values (numeric)                         |                |                |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |                |                  |      |
+| NaN, Infinity, -Infinity (float/double)                 |                |                |                  |      |
+| Multibyte / special UTF-8 characters                    |                |                |                  |      |


I would like to make sure this edge cases for UTF-8
val edgeCases = Seq(
"é", // unicode 'e\u{301}'
"é", // unicode '\u{e9}'
"తెలుగు")

kazuyukitanimura · 2026-04-01T22:16:17Z

.claude/skills/audit-comet-expression/SKILL.md

+| Column reference argument(s)                            |                |                |                  |      |
+| Literal argument(s)                                     |                |                |                  |      |
+| NULL input                                              |                |                |                  |      |
+| Empty string / empty array / empty map                  |                |                |                  |      |


What about array with "null" elements?

kazuyukitanimura · 2026-04-01T22:16:49Z

.claude/skills/audit-comet-expression/SKILL.md

+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)


And underflow

kazuyukitanimura · 2026-04-01T22:17:44Z

.claude/skills/audit-comet-expression/SKILL.md

+| NULL input                                              |                |                |                  |      |
+| Empty string / empty array / empty map                  |                |                |                  |      |
+| Zero, negative values (numeric)                         |                |                |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |                |                  |      |


Should we specifically say minimum positive number?

- Add set -eu -o pipefail for error handling in bash snippets - Remove unnecessary grep -v test filter on src/main path - Replace broad ~/.cargo/registry search with $DATAFUSION_SRC env var - Add underflow, negative zero, subnormal, minimum positive to gap matrix - Add array/map with NULL elements row - Expand UTF-8 row with composed vs decomposed and non-Latin examples - Fix trailing colon after "prioritized list" - Add -DwildcardSuites to test command

andygrove · 2026-04-02T22:03:17Z

Thanks the the approval @martin-g.

@kazuyukitanimura I address your feedback and plan on merging this PR once CI is green.

andygrove added 2 commits March 25, 2026 12:52

feat: add audit-comet-expression skill

15e6f4d

chore: run prettier on audit-comet-expression SKILL.md

6bd0519

andygrove marked this pull request as ready for review March 25, 2026 20:28

andygrove requested review from kazuyukitanimura and martin-g April 1, 2026 11:15

martin-g reviewed Apr 1, 2026

View reviewed changes

kazuyukitanimura reviewed Apr 1, 2026

View reviewed changes

martin-g approved these changes Apr 2, 2026

View reviewed changes

prettier

5637162

andygrove merged commit fb180b0 into apache:main Apr 2, 2026
4 checks passed

andygrove deleted the audit-comet-expression-skill branch April 2, 2026 22:03

vaibhawvipul pushed a commit to vaibhawvipul/datafusion-comet that referenced this pull request Apr 4, 2026

feat: add audit-comet-expression Claude Code skill (apache#3793)

199b337


		## Step 6: Recommendations

		Summarize findings as a prioritized list:

	./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none
	./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -DwildcardSuites="$ARGUMENTS" -Dtest=none

Conversation

andygrove commented Mar 25, 2026

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants