yarn # Install dependencies
yarn build # Compile TypeScript (tsup + tsc)
yarn test # Run all tests (6,100+ tests)
yarn test:watch # Run tests in watch mode
yarn typecheck # Type-check without emitting
yarn lint # Run ESLint
yarn lint:fix # Auto-fix lint issues
yarn generate:cst # Regenerate CST type definitions from parser grammar
yarn clean # Remove dist/ and coverage/Every SQL string flows through this pipeline:
SQL String ──> Lexer (tokens.ts/lexer.ts) ──> Token[]
│
Token[] ──────> Parser (parser.ts) ───────> CST (Concrete Syntax Tree)
│
CST ──────────> Visitor (visitor.ts) ──────> AST (typed, clean)
│
AST ──────────> toSql (toSql.ts) ──────────> SQL String (round-trip)
The CST is Chevrotain's lossless tree that preserves every token. The visitor transforms it into a clean, typed AST that is easy to work with. toSql() converts any AST node back to valid SQL.
For autocomplete, the flow is:
SQL + cursor offset ──> content-assist.ts ──> parser.computeContentAssist()
│
nextTokenTypes + tablesInScope + cteColumns
│
suggestion-builder.ts ──> Suggestion[] (filtered, prioritized)
Grammar arrays (src/grammar/keywords.ts, dataTypes.ts, constants.ts) are the source of truth. src/parser/tokens.ts auto-generates Chevrotain tokens from them:
- Each keyword string is converted to a PascalCase token name (
"select"→Select,"data_page_size"→DataPageSize) - Each token gets a case-insensitive regex pattern with word boundary (e.g.,
/select\b/i) - Non-reserved keywords are assigned to the
IdentifierKeywordcategory, which lets the parser accept them as table/column names via a singleCONSUME(IdentifierKeyword)rule
The IDENTIFIER_KEYWORD_NAMES set in tokens.ts controls which keywords are non-reserved. Reserved keywords (SELECT, FROM, WHERE, JOIN, etc.) are not in this set and cannot be used as unquoted identifiers.
Example: adding a hypothetical RETENTION keyword.
1. Add to grammar — src/grammar/keywords.ts:
export const keywords: string[] = [
// ...existing keywords in alphabetical order...
"retention",
// ...
]This auto-generates a Retention token in tokens.ts.
2. If non-reserved, mark it — src/parser/tokens.ts:
If the keyword can be used as an identifier (table/column name), add it to IDENTIFIER_KEYWORD_NAMES:
export const IDENTIFIER_KEYWORD_NAMES = new Set([
// ...
"Retention",
])Skip this step if the keyword is reserved (i.e., it introduces structural ambiguity as an identifier).
3. Use in parser grammar — src/parser/parser.ts:
Reference the token in a grammar rule:
private retentionClause = this.RULE("retentionClause", () => {
this.CONSUME(Retention)
this.CONSUME(NumberLiteral)
this.SUBRULE(this.partitionPeriod) // DAY, MONTH, etc.
})Make sure to import the token from lexer.ts at the top of parser.ts. The token is available by its PascalCase name.
4. Regenerate CST types:
yarn generate:cstThis reads the parser's grammar rules and regenerates src/parser/cst-types.d.ts. The new rule's CST children type will appear automatically (e.g., RetentionClauseCstChildren).
5. Add visitor method — src/parser/visitor.ts:
Import the new CST type from cst-types.d.ts, then add a visitor method:
retentionClause(ctx: RetentionClauseCstChildren): AST.RetentionClause {
return {
type: "retentionClause",
value: parseInt(ctx.NumberLiteral[0].image, 10),
unit: this.visit(ctx.partitionPeriod[0]),
}
}6. Add AST type — src/parser/ast.ts:
export interface RetentionClause extends AstNode {
type: "retentionClause"
value: number
unit: string
}7. Add toSql serialization — src/parser/toSql.ts:
function retentionClauseToSql(clause: AST.RetentionClause): string {
return `RETENTION ${clause.value} ${clause.unit}`
}Wire it into the parent statement's toSql function.
8. Add tests — tests/parser.test.ts:
it("should parse RETENTION clause", () => {
const result = parseToAst("CREATE TABLE t (x INT) RETENTION 30 DAY")
expect(result.errors).toHaveLength(0)
// assert AST structure...
})
it("should round-trip RETENTION clause", () => {
const sql = "CREATE TABLE t (x INT) RETENTION 30 DAY"
const result = parseToAst(sql)
const roundtrip = toSql(result.ast[0])
const result2 = parseToAst(roundtrip)
expect(result2.errors).toHaveLength(0)
})9. Run tests:
yarn testSame as adding a keyword, but the scope is larger:
- Grammar: add all tokens to
src/grammar/keywords.ts(andsrc/parser/tokens.tsif non-reserved) - Parser: add a new top-level rule in
parser.ts, register it in thestatementrule's alternatives - CST types:
yarn generate:cst - AST: add the statement interface to
ast.ts, add it to theStatementunion type - Visitor: add visitor method in
visitor.ts - toSql: add serializer in
toSql.ts, add the case to thestatementToSqlswitch - Tests: parse tests, AST structure assertions, and round-trip tests
Autocomplete has four layers:
-
content-assist.ts— determines what the parser expects at the cursor position. Extracts tables in scope (FROM/JOIN clauses), CTE definitions, and qualified references (e.g.,t1.). You rarely need to modify this unless you're changing how scope is detected. -
token-classification.ts— classifies tokens into categories:SKIP_TOKENS(never suggested),EXPRESSION_OPERATORS(lower priority),IDENTIFIER_KEYWORD_TOKENS(trigger schema suggestions). When adding a new token, decide which category it belongs to. -
suggestion-builder.ts— converts parser token types + schema intoSuggestion[]. Controls priority (columns > keywords > functions > tables), handles qualified references, and manages deduplication. -
provider.ts— orchestrates the above and adds context detection: after FROM → suggest tables, after SELECT → suggest columns, after*→ suppress columns (alias position), etc. ThegetIdentifierSuggestionScope()function is the main context switcher.
Reserved vs. non-reserved keywords: QuestDB has ~60 reserved keywords. Everything else (data types, time units, config keys like maxUncommittedRows) is non-reserved and can be used as an unquoted identifier. The IdentifierKeyword token category in Chevrotain handles this — the parser's identifier rule accepts any IdentifierKeyword token.
CST vs. AST: The CST preserves every token (including keywords, punctuation, whitespace position). The AST is a clean semantic representation. The visitor decides what to keep. For example, the CST has separate Select, Star, From tokens; the AST just has { type: "select", columns: [{ type: "star" }], from: [...] }.
Round-trip correctness: toSql(parseToAst(sql).ast) must produce SQL that parses to an equivalent AST. This is verified against 1,726 real queries in docs-roundtrip.test.ts. When adding new features, always test round-trip.
Error recovery: The parser uses Chevrotain's semicolon-based error recovery. When a statement fails to parse, it skips to the next semicolon and continues. The visitor handles incomplete CST nodes with try-catch. This means parseToAst() can return both ast (partial) and errors simultaneously.