-
Notifications
You must be signed in to change notification settings - Fork 1
Add deduplicate suffix to OPTIMIZE query explain output #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Handle the Dedupe field in explainOptimizeQuery to append "_deduplicate" suffix to the table name, matching ClickHouse's EXPLAIN AST format. This fixes 15 previously failing explain tests related to OPTIMIZE TABLE with DEDUPLICATE clause.
- Fix parseColumnDeclaration to use token.COMMENT instead of checking for IDENT with value "COMMENT" since COMMENT is a keyword token - Add comment field handling in explain's Column function to output comments as Literal nodes This fixes 15 previously failing explain tests related to column comments in CREATE TABLE and ALTER TABLE statements.
Add peekPeek field to Parser for three-token lookahead, enabling proper distinction between: - INTERVAL '2' AS n minute (alias on value, unit follows) - INTERVAL '1 MONTH 1 DAY' AS e4 (alias for outer WITH expression) Also add isIntervalUnit helper with support for all interval units including NANOSECOND, MICROSECOND, and MILLISECOND. This fixes 5 previously failing explain tests in 02457_tuple_of_intervals.
Add isExplainFollowedByStatement helper to distinguish between: - EXPLAIN as a statement keyword (followed by SELECT, WITH, AST, etc.) - EXPLAIN as an identifier/column name (followed by LIKE, =, etc.) This fixes WHERE clauses like "(explain LIKE '%...')" where 'explain' is a column alias being mistakenly parsed as an EXPLAIN statement. This fixes 19 previously failing explain tests including those in 02354_vector_search_rescoring and related tests.
- Add Format field to UndropQuery AST - Parse FORMAT clause in parseUndrop function - Update explainUndropQuery to output Format as child - Reorder SYNC and FORMAT parsing in parseDrop to handle both orderings Fixes 5 explain tests in 02681_undrop_query
- Add MultipleUsers field to ShowQuery to track SHOW CREATE USER with multiple users - Parse and detect commas in SHOW CREATE USER to identify multiple users - Output "SHOW CREATE USERS query" (plural) when multiple users specified - Capture REALM and SERVER authentication values (for kerberos/ldap) - Output each authentication method as a separate AuthenticationData child Fixes 4 explain tests in 01292_create_user and 2 in 03174_multiple_authentication_methods_show_create
- Add Parenthesized field to Literal struct - Mark literals inside parentheses as parenthesized in parseGroupedOrTuple - In explainUnaryExpr, only fold to negative literal when operand is NOT parenthesized - This distinguishes "-1" (folded to Int64_-1) from "-(1)" (negate function with UInt64_1) Fixes 4 explain tests in 01881_negate_formatting
- Add Filter field to FunctionCall struct - Store FILTER(WHERE condition) in parser - Transform function name to functionNameIf when filter is present - Add filter condition as extra argument in explain output Fixes 4 explain tests in 02001_select_with_filter, plus - 1 in 03003_count_asterisk_filter - 2 in 02025_nested_func_for_if_combinator
03273_select_from_explain_ast_non_select stmt2-5 are marked as clientError in the query file, meaning ClickHouse doesn't produce output for them. These shouldn't be in explain_todo since there's no expected output to match.
Add support for ALL modifier in aggregate functions (e.g., sum(ALL number)) which is silently skipped like in ClickHouse. Also distinguish between DISTINCT/ALL used as modifiers versus column names by checking if the keyword is followed by ) or , (indicating it's a column reference). Fixes: 01632_select_all_syntax (stmt6, stmt8, stmt11, stmt15)
- Add SpacedCommas field to Literal to track original comma spacing - Preserve original source formatting in CAST expressions (e.g., [1,2,3] vs [1, 2, 3]) - Handle arrays with non-literal elements (identifiers) by outputting as Function array - Update containsOnlyLiterals to recognize negated literals (e.g., -1 in arrays) Fixes: 01852_cast_operator_4 and 27 additional statements across multiple tests
Parse INSERT column list expressions (*, table.*, COLUMNS(...)) with their transformers (EXCEPT, APPLY, REPLACE). Added ColumnExpressions field to InsertQuery to store these parsed expressions. Fixes: 01470_test_insert_select_asterisk (stmt6, stmt7, stmt8, stmt9)
…n list Extended CREATE MATERIALIZED VIEW parsing to handle INDEX, PROJECTION, and PRIMARY KEY definitions inside the column list parentheses, similar to how CREATE TABLE handles them. Fixes: 02982_create_mv_inner_extra (stmt8, stmt9, stmt10, stmt11)
Large integers (128-bit, 256-bit) that overflow int64/uint64 are stored as LiteralString in the AST. When formatting these in arrays, they should not be quoted like regular strings. Added IsBigInt field to Literal struct to distinguish between actual strings and numeric overflow cases.
Parse IDENTIFIED WITH ssh_key BY KEY ... TYPE ... syntax and count SSH keys for EXPLAIN output. Each SSH key is displayed as a PublicSSHKey child under AuthenticationData.
ClickHouse normalizes -0 to 0 in integer arrays, so we should format it as UInt64_0 instead of Int64_0. This fixes formatting for arrays containing -0 like [-0, 1, 2, ...].
…ith UUID - Fix MATERIALIZED keyword detection using token.MATERIALIZED instead of IDENT check - Fix INNER keyword detection using token.INNER instead of IDENT check - Add UUID clause parsing to parseCreateTable (skips UUID but continues parsing) - Add TO INNER UUID clause parsing to parseAttach for materialized views - Update explainAttachQuery to handle ViewTargets for materialized views Fixes: 01153_attach_mv_uuid (all 38 statements now pass) Also fixes: 02990_rmt_replica_path_uuid and 03541_table_without_insertable_columns
- Parse ADD INDEX expressions without parentheses (e.g., ADD INDEX idx u64 * i32 TYPE minmax) - Add AfterIndex field to AlterCommand for ADD INDEX ... AFTER name - Update explain to output AfterIndex identifier for ADD_INDEX commands Fixes: 00836_indices_alter_replicated_zookeeper_long and 20+ other tests with ADD INDEX
Parse the RESET SETTING command in ALTER TABLE statements. This command allows resetting table settings to their default values. Fixes tests: - 00980_merge_alter_settings (4 statements) - 00980_zookeeper_merge_tree_alter_settings (3 statements) - 02252_reset_non_existing_setting (1 statement) - 02097_remove_sample_by (1 statement) - 03164_materialize_skip_index_on_merge (1 statement) - 03261_minmax_indices_by_default (2 statements)
Parse and skip RECOMPRESS CODEC(...) clauses in TTL expressions, and support multiple comma-separated TTL elements. This allows TTL clauses like: TTL dt + INTERVAL 1 MONTH RECOMPRESS CODEC(ZSTD(17)), dt + INTERVAL 1 YEAR RECOMPRESS CODEC(LZ4HC(10)) Also fixes ALTER TABLE MODIFY TTL with RECOMPRESS CODEC and SETTINGS. Fixes 01465_ttl_recompression (4 statements)
Keywords like DEFAULT can be used as codec names (e.g., CODEC(T64, Default)). Modified parseCodecExpr to accept both identifiers and keywords as codec names. Fixes: - 01504_compression_multiple_streams (4 statements) - 01455_default_compression (3 statements)
- Add IN PARTITION parsing for ALTER UPDATE mutations - Fix expression parser consuming "IN PARTITION" as IN expression by detecting and unwrapping when last assignment value is InExpr with PARTITION as list - Add Partition_ID (empty) output for PARTITION ALL in: - AlterUpdate: UPDATE ... IN PARTITION ALL - OptimizeQuery: OPTIMIZE TABLE ... PARTITION ALL - AlterClearIndex/AlterDropIndex: CLEAR INDEX IN PARTITION ALL - AlterClearColumn: CLEAR COLUMN IN PARTITION ALL Fixes test 00753_alter_attach (4 statements).
- Parse comma-separated PRIMARY KEY columns without parentheses (e.g., PRIMARY KEY id, id_key) - Fix EXPLAIN output order for dictionary definitions: LAYOUT should come before RANGE, not after Fixes 01852_dictionary_query_count_long and 47 other dictionary tests (56 statements total).
- Fix OVER (name clauses...) to not return early after parsing the name - Add named window reference handling in WINDOW clause definitions (e.g., w1 AS (w0 ORDER BY ...)) Fixes 01591_window_functions (4 statements) and 02378_analyzer_projection_names.
Add ShowSetting as a separate ShowType from ShowSettings (plural). When parsing SHOW queries, detect "SETTING" as an IDENT and set the appropriate type. This fixes 3 statements in 02905_show_setting_query and 3 statements in 00405_output_format_pretty_color.
Add CreateNamedCollectionQuery, AlterNamedCollectionQuery, and DropNamedCollectionQuery as separate AST types with their own parsers and explain handlers. This properly parses CREATE/ALTER/DROP NAMED COLLECTION statements. Fixes 6 statements across 3 tests: - 02908_empty_named_collection (3 statements) - 02908_filesystem_cache_as_collection (1 statement) - 02918_fuzzjson_table_function (2 statements)
Handle two cases that were not working: 1. WITH clause followed by FROM-first syntax: `WITH 1 as n FROM t SELECT n` 2. Nested FROM-first syntax in subqueries: `FROM (FROM t SELECT *) SELECT x` In parseSelect(), check for FROM token after WITH clause and parse the table expression before expecting SELECT. In parseTableExpression(), add FROM token check to recognize FROM-first subqueries. Fixes 3 statements in 02417_from_select_syntax.
Add Temporary field to ExistsQuery and ShowQuery AST types. Parse TEMPORARY keyword in: - EXISTS TEMPORARY TABLE statements - SHOW TEMPORARY TABLES statements Fixes 5 statements across 2 tests: - 00564_temporary_table_management (3 statements) - 00492_drop_temporary_table (2 statements)
When a tuple contains array literals, render as Function tuple format
instead of Literal Tuple_ format. This matches ClickHouse's EXPLAIN AST
output which shows tuples with arrays as:
Function tuple (alias a) (children 1)
ExpressionList (children 2)
Literal UInt64_456
Literal Array_[...]
Fixes 3 statements in 00300_csv.
- Store inherited WITH clause in InsertQuery.With instead of propagating to SelectQuery.With - Add recursive explain functions to handle inherited WITH in select trees - Output inherited WITH at the end of each SelectQuery's children (after tables) - Handle SelectWithUnionQuery and SelectIntersectExceptQuery with inherited WITH Fixes 3 statements in 03248_with_insert and 5 additional statements in other tests.
- Add Settings field to DropQuery AST type - Parse SETTINGS clause in parseDrop - Output Set child in explainDropQuery when settings present Fixes 3 statements in 03013_ignore_drop_queries_probability and 2 statements in 02932 tests.
- Add parseCreateOrderByExpressions to handle ASC/DESC in ORDER BY - Add OrderByHasModifiers flag to CreateQuery to track modifiers - Swap PRIMARY KEY and ORDER BY output order in storage definition explain - Output "Function tuple" without children when ORDER BY has modifiers - Output "Function tuple (children N)" for regular ORDER BY tuples Fixes 3 statements in 03286_reverse_sorting_key_final2 and 21 additional statements in other tests.
Large integers stored as BigInt are converted to Float64 in scientific notation when negated, matching ClickHouse's EXPLAIN AST behavior.
Parse and output SETTINGS clause for DELETE FROM queries, matching ClickHouse's EXPLAIN AST format.
Parse the IF EMPTY modifier in addition to IF EXISTS for DROP statements.
When the lexer produces a single NUMBER token for chained dot-number sequences like .1.2.3, split by dots and create nested TupleAccess nodes.
Matches ClickHouse behavior where FREEZE without a PARTITION clause outputs FREEZE_ALL in EXPLAIN AST.
Track whether there's whitespace after the opening [ bracket in array literals and use this to format arrays with outer spaces when appropriate. This matches ClickHouse's EXPLAIN AST behavior for multi-line arrays.
Handle FORMAT and SETTINGS ordering in EXPLAIN output: - Extract FORMAT clause to be a child of Explain node - Extract SETTINGS after FORMAT (SettingsAfterFormat flag) to Explain level - Keep SETTINGS before FORMAT within the SelectQuery Fixed tests: - 02989_join_using_parent_scope (stmt32) - 02798_explain_settings_not_applied_bug (stmt8)
Output empty grouping set () as 'ExpressionList' without children count, matching ClickHouse's expected EXPLAIN format. Fixed tests (10 statements): - 02293_grouping_function (stmt7) - 01883_with_grouping_sets (stmt9) - 02315_grouping_constant_folding (stmt4, stmt6) - 02416_grouping_function_compatibility (stmt4) - 03611_uniqExact_bug (stmt10) - 03708_analyzer_convert_any_outer_to_inner_2 (stmt12) - 03654_grouping_sets_any_min_max (stmt2, stmt12, stmt14)
1. Parse COLUMNS as identifier when not followed by (
- Previously COLUMNS was always parsed as column matcher
- Now it can be used as identifier (e.g., table name in Distributed())
2. Add parseEngineParameters to skip implicit alias parsing
- Engine parameters like Distributed('cluster', db, table) should
not treat 'table' as an alias for 'db'
Fixed tests (7 statements):
- 00821_distributed_storage_with_join_on (stmt3)
- 03550_analyzer_remote_view_columns (stmt7)
- 03310_index_hints_read_columns (stmt12, stmt26)
- 02735_parquet_encoder (stmt70, stmt72, stmt74)
QBit is a ClickHouse data type for quantum bit vectors. Adding it to isDataTypeName() ensures it's parsed as a DataType instead of a FunctionCall when used in column declarations. Fixed: 03374_qbit_nullable (stmt4)
When FILTER clause is present: - count(name) FILTER (WHERE cond) -> countIf(name, cond) - keeps args - count(*) FILTER (WHERE cond) -> countIf(cond) - drops asterisk Fixed 2 tests: 03003_count_asterisk_filter, 03705_count_if_asterisk
When the RHS of IN contains a mix of primitive literals (integers, strings, etc.) and tuple literals that only contain primitives, output as a single Literal Tuple_ instead of Function tuple. Example: (number, tuple) IN (3, (2, 3)) now correctly outputs: Literal Tuple_(UInt64_3, Tuple_(UInt64_2, UInt64_3)) Fixed test: 00132_sets/stmt12
Parse TEMPORARY keyword in TRUNCATE statements and skip it when determining the table name. Fixed test: 00670_truncate_temporary_table/stmt7
DEC is a SQL standard alias for DECIMAL. Without this, DEC inside Nullable() was being parsed as a function call instead of a data type. Fixed test: 00700_decimal_null/stmt2
Handle LIMIT offset, count BY expr and LIMIT n OFFSET m BY expr syntax by storing the offset separately from the regular OFFSET clause. - Added LimitByOffset field to SelectQuery AST - Updated parser to extract offset for LIMIT BY clauses - Updated explain output to include LimitByOffset Fixed test: 00939_limit_by_offset/stmt7
The CHECK TABLE statement supports PARTITION and PART clauses to check specific partitions or parts. Added these fields to the AST and parser. Fixed test: 00961_check_table/stmt20
This fixes parsing of SimpleAggregateFunction(sum, Double) where Double should be recognized as a data type, not an identifier. Fixed 6 tests: - 00915_simple_aggregate_function/stmt3 - 00915_simple_aggregate_function_summing_merge_tree/stmt3 - 01392_column_resolve/stmt1, stmt2 - 02775_show_columns_called_from_clickhouse/stmt4 - 02790_fix_coredump_when_compile_expression/stmt1
Handle unary expressions (like -1) in data type parameters by properly formatting them as operator + value instead of falling through to the default fmt.Sprintf which showed Go struct representation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Handle the Dedupe field in explainOptimizeQuery to append "_deduplicate"
suffix to the table name, matching ClickHouse's EXPLAIN AST format.
This fixes 15 previously failing explain tests related to OPTIMIZE TABLE
with DEDUPLICATE clause.