Skip to content

feat: add audit-comet-expression Claude Code skill#3793

Merged
andygrove merged 4 commits intoapache:mainfrom
andygrove:audit-comet-expression-skill
Apr 2, 2026
Merged

feat: add audit-comet-expression Claude Code skill#3793
andygrove merged 4 commits intoapache:mainfrom
andygrove:audit-comet-expression-skill

Conversation

@andygrove
Copy link
Copy Markdown
Member

Summary

Adds a new Claude Code skill .claude/skills/audit-comet-expression/SKILL.md for auditing existing Comet expression implementations for correctness and test coverage.

The skill (/audit-comet-expression <expression-name>) performs a structured audit:

  • Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, cloning each tag to /tmp/ and diffing behavioral changes across versions
  • Reviews the Comet Scala serde, shims, and Rust/DataFusion implementation for correctness (null handling, type dispatch, ANSI mode, getSupportLevel accuracy)
  • Inventories existing Comet tests: SQL file tests and Scala tests
  • Produces a coverage gap matrix against Spark's own test suite (nulls, boundary values, NaN, multibyte UTF-8, ANSI mode, all input types, Parquet dictionary encoding, cross-version differences)
  • Flags implementation gaps such as missing shims or incorrect Incompatible/Unsupported markings
  • Offers to implement the missing tests (preferring the SQL file test framework)

This complements the existing review-comet-pr skill, which focuses on reviewing incoming PRs. The audit skill is for proactively checking the quality of expressions already in the codebase.

@andygrove andygrove marked this pull request as ready for review March 25, 2026 20:28

```bash
# Check if there's a DataFusion built-in function with this name
find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this ?
It just prints version = "..."

```bash
# Check if there's a DataFusion built-in function with this name
find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5
grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | head -10
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks too broad.
It greps in all crates you have cargo fetched for any of your local builds.

Idea: the Bash snippet could check for existence of some predefined env var, e.g. $DATAFUSION_SRC and grep inside it. Every developer will have to export this env var in his/her shell.


## Step 6: Recommendations

Summarize findings as a prioritized list:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colon at at end of the sentence suggests that something follows

```bash
# Find the serde object
grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ --include="*.scala" -l
grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ --include="*.scala" -l | grep -v test
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of | grep -v test at the end ?
It greps in src/main, so tests are not expected there. Also I would expect -i or Test
Currently it would ignore any file/folder containing latest, for example.

Clone specific Spark version tags (use shallow clones to avoid polluting the workspace). Only clone a version if it is not already present.

```bash
for tag in v3.4.3 v3.5.8 v4.0.1; do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never written a Claude skill before...
Does it need some kind of error handling ?
E.g. if git clone ... fails then the rest of the scripts should not be executed.

Maybe add set -eu -o pipefail in the beginning ?!

After implementing tests, tell the user how to run them:

```bash
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't $ARGUMENTS be passed as -DwildcardSuites="$ARGUMENTS" ?

Suggested change
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -DwildcardSuites="$ARGUMENTS" -Dtest=none

Copy link
Copy Markdown
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

I did not try this skill by myself but I listed some edge cases below

| Empty string / empty array / empty map | | | | |
| Zero, negative values (numeric) | | | | |
| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | |
| NaN, Infinity, -Infinity (float/double) | | | | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add negative zero as well as subnormal float/double?

| Zero, negative values (numeric) | | | | |
| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | |
| NaN, Infinity, -Infinity (float/double) | | | | |
| Multibyte / special UTF-8 characters | | | | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make sure this edge cases for UTF-8
val edgeCases = Seq(
"é", // unicode 'e\u{301}'
"é", // unicode '\u{e9}'
"తెలుగు")

| Column reference argument(s) | | | | |
| Literal argument(s) | | | | |
| NULL input | | | | |
| Empty string / empty array / empty map | | | | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about array with "null" elements?

Read the Rust implementation and check:

- Null handling (does it propagate nulls correctly?)
- Overflow / error handling (returns `Err` vs panics)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And underflow

| NULL input | | | | |
| Empty string / empty array / empty map | | | | |
| Zero, negative values (numeric) | | | | |
| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specifically say minimum positive number?

- Add set -eu -o pipefail for error handling in bash snippets
- Remove unnecessary grep -v test filter on src/main path
- Replace broad ~/.cargo/registry search with $DATAFUSION_SRC env var
- Add underflow, negative zero, subnormal, minimum positive to gap matrix
- Add array/map with NULL elements row
- Expand UTF-8 row with composed vs decomposed and non-Latin examples
- Fix trailing colon after "prioritized list"
- Add -DwildcardSuites to test command
@andygrove
Copy link
Copy Markdown
Member Author

Thanks the the approval @martin-g.

@kazuyukitanimura I address your feedback and plan on merging this PR once CI is green.

@andygrove andygrove merged commit fb180b0 into apache:main Apr 2, 2026
4 checks passed
@andygrove andygrove deleted the audit-comet-expression-skill branch April 2, 2026 22:03
vaibhawvipul pushed a commit to vaibhawvipul/datafusion-comet that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants