feat: add audit-comet-expression Claude Code skill#3793
feat: add audit-comet-expression Claude Code skill#3793andygrove merged 4 commits intoapache:mainfrom
Conversation
|
|
||
| ```bash | ||
| # Check if there's a DataFusion built-in function with this name | ||
| find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5 |
There was a problem hiding this comment.
What is the purpose of this ?
It just prints version = "..."
| ```bash | ||
| # Check if there's a DataFusion built-in function with this name | ||
| find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep "version" | head -5 | ||
| grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | head -10 |
There was a problem hiding this comment.
This looks too broad.
It greps in all crates you have cargo fetched for any of your local builds.
Idea: the Bash snippet could check for existence of some predefined env var, e.g. $DATAFUSION_SRC and grep inside it. Every developer will have to export this env var in his/her shell.
|
|
||
| ## Step 6: Recommendations | ||
|
|
||
| Summarize findings as a prioritized list: |
There was a problem hiding this comment.
The colon at at end of the sentence suggests that something follows
| ```bash | ||
| # Find the serde object | ||
| grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ --include="*.scala" -l | ||
| grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ --include="*.scala" -l | grep -v test |
There was a problem hiding this comment.
What is the purpose of | grep -v test at the end ?
It greps in src/main, so tests are not expected there. Also I would expect -i or Test
Currently it would ignore any file/folder containing latest, for example.
| Clone specific Spark version tags (use shallow clones to avoid polluting the workspace). Only clone a version if it is not already present. | ||
|
|
||
| ```bash | ||
| for tag in v3.4.3 v3.5.8 v4.0.1; do |
There was a problem hiding this comment.
I have never written a Claude skill before...
Does it need some kind of error handling ?
E.g. if git clone ... fails then the rest of the scripts should not be executed.
Maybe add set -eu -o pipefail in the beginning ?!
| After implementing tests, tell the user how to run them: | ||
|
|
||
| ```bash | ||
| ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none |
There was a problem hiding this comment.
Shouldn't $ARGUMENTS be passed as -DwildcardSuites="$ARGUMENTS" ?
| ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite $ARGUMENTS" -Dtest=none | |
| ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -DwildcardSuites="$ARGUMENTS" -Dtest=none |
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Thanks @andygrove
I did not try this skill by myself but I listed some edge cases below
| | Empty string / empty array / empty map | | | | | | ||
| | Zero, negative values (numeric) | | | | | | ||
| | Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | | | ||
| | NaN, Infinity, -Infinity (float/double) | | | | | |
There was a problem hiding this comment.
Would you add negative zero as well as subnormal float/double?
| | Zero, negative values (numeric) | | | | | | ||
| | Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | | | ||
| | NaN, Infinity, -Infinity (float/double) | | | | | | ||
| | Multibyte / special UTF-8 characters | | | | | |
There was a problem hiding this comment.
I would like to make sure this edge cases for UTF-8
val edgeCases = Seq(
"é", // unicode 'e\u{301}'
"é", // unicode '\u{e9}'
"తెలుగు")
| | Column reference argument(s) | | | | | | ||
| | Literal argument(s) | | | | | | ||
| | NULL input | | | | | | ||
| | Empty string / empty array / empty map | | | | | |
There was a problem hiding this comment.
What about array with "null" elements?
| Read the Rust implementation and check: | ||
|
|
||
| - Null handling (does it propagate nulls correctly?) | ||
| - Overflow / error handling (returns `Err` vs panics) |
| | NULL input | | | | | | ||
| | Empty string / empty array / empty map | | | | | | ||
| | Zero, negative values (numeric) | | | | | | ||
| | Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) | | | | | |
There was a problem hiding this comment.
Should we specifically say minimum positive number?
- Add set -eu -o pipefail for error handling in bash snippets - Remove unnecessary grep -v test filter on src/main path - Replace broad ~/.cargo/registry search with $DATAFUSION_SRC env var - Add underflow, negative zero, subnormal, minimum positive to gap matrix - Add array/map with NULL elements row - Expand UTF-8 row with composed vs decomposed and non-Latin examples - Fix trailing colon after "prioritized list" - Add -DwildcardSuites to test command
|
Thanks the the approval @martin-g. @kazuyukitanimura I address your feedback and plan on merging this PR once CI is green. |
Summary
Adds a new Claude Code skill
.claude/skills/audit-comet-expression/SKILL.mdfor auditing existing Comet expression implementations for correctness and test coverage.The skill (
/audit-comet-expression <expression-name>) performs a structured audit:/tmp/and diffing behavioral changes across versionsgetSupportLevelaccuracy)Incompatible/UnsupportedmarkingsThis complements the existing
review-comet-prskill, which focuses on reviewing incoming PRs. The audit skill is for proactively checking the quality of expressions already in the codebase.