Skip to content

Conversation

@diegoimbert
Copy link
Contributor

@diegoimbert diegoimbert commented Dec 15, 2025

No description provided.

@windmill-labs windmill-labs deleted a comment from claude bot Dec 19, 2025
@diegoimbert
Copy link
Contributor Author

/ai review this PR. no need for praises just find the potential issues

@claude
Copy link
Contributor

claude bot commented Dec 19, 2025

Claude finished @diegoimbert's task in 1m 39s —— View job


PR Review: Potential Issues Found

  • Analyze backend parser changes (Rust)
  • Analyze frontend changes (Svelte/TypeScript)
  • Review new SQL type plugin worker
  • Review client library changes
  • Check for edge cases and potential bugs
  • Compile findings

1. SQL Comment Removal - Doesn't Handle String Literals Correctly

File: backend/windmill-worker/src/sql_utils.rs (lines 2-28)

The remove_comments function has a known bug mentioned in the test but not fixed:

#[test]
fn test_remove_comments_comment_in_string() {
    let sql = "SELECT '-- not a comment' FROM table;";
    let result = remove_comments(sql);
    assert_eq!(result, "SELECT '-- not a comment' FROM table;");
}

The test passes, but this is coincidental - the function doesn't actually track string state. If a string contains -- followed by actual SQL on the same line, it would be incorrectly treated as a comment.


2. SQL Query Statement Count Heuristic - Escape Sequence Bug

File: frontend/src/lib/infer.svelte.ts (lines 110, 126-127)

The getQueryStmtCountHeuristic function uses query[i - 1] !== '\\' to detect escaped quotes, but this is incorrect for:

  1. SQL uses '' to escape quotes, not \'
  2. In PostgreSQL, E'string' syntax uses \' but regular strings use ''
  3. The backslash check could cause false negatives with valid SQL like 'path\\data'

3. Worker Cache Memory Leak Risk

File: frontend/src/lib/monaco_workers/sqlTypePlugin.worker.js (lines 50-56)

The SqlAwareTypeScriptWorker class caches transformed code per file but there's no cache eviction mechanism:

this._sqlQueriesByFile = new Map()
this._fileVersions = new Map()
this._transformedCodeCache = new Map()

If users edit many files over a long session, these maps could grow unbounded. Consider implementing an LRU cache or cleaning up when files are closed.


4. Potential Race Condition in SQL Query Updates

File: frontend/src/lib/components/sqlTypeService.ts (lines 82-119)

The updateSqlQueriesInWorker function doesn't handle the case where the model might be disposed between checking and using it:

const model = editor.getModel(uri)
if (!model) {
    console.warn(`[SqlTypeService] No Monaco model found for URI: ${uriString}`)
    return
}
// ... model could be disposed here before revalidateModel is called
if (typeof worker.updateSqlQueries === 'function') {
    await worker.updateSqlQueries(uriString, queries)
    revalidateModel(model)  // model might be disposed
}

5. Silent Error Swallowing in Asset Parser

File: backend/parsers/windmill-parser-ts/src/asset_parser.rs (lines 281-283)

SQL parsing errors are silently ignored:

match windmill_parser_sql::parse_assets(&sql_with_attach) {
    Ok(mut sql_assets) => { ... }
    _ => {}  // Errors silently swallowed
}

Consider at least logging these errors for debugging purposes.


6. Type Injection XSS-like Vulnerability (Low Risk)

File: frontend/src/lib/monaco_workers/sqlTypePlugin.worker.js (lines 28-32)

Column names from SQL queries are directly interpolated into TypeScript code without escaping:

let middlePart =
    '<{ ' +
    Object.entries(query?.prepared?.columns ?? {})
        .map(([key, type]) => `"${key}": ${type}`)  // key not escaped
        .join('; ') +
    ' }>'

If a column name contains " or other special characters, it could break the generated TypeScript. Consider escaping the key properly with JSON.stringify(key) instead of manual quoting.


7. Hardcoded Worker Initialization Timeout

File: frontend/src/lib/components/sqlTypeService.ts (lines 36-38)

const WORKER_INIT_TIMEOUT = 10000
const MAX_RETRIES = 10
const RETRY_DELAY = 300

These values are hardcoded and might be insufficient on slower machines or under heavy load. Consider making them configurable or increasing the defaults.


8. Missing Null Check in MapResource

File: frontend/src/lib/svelte5Utils.svelte.ts (line 150-151)

for (const key of Object.keys(toFetch)) {
    let value = fetchedData[key]
    obj[key] = value  // value could be undefined if fetcher didn't return all keys
}

If the fetcher fails to return data for some keys, undefined values will be stored. This could cause unexpected behavior downstream.


9. Type Assertion Without Validation

File: frontend/src/lib/infer.svelte.ts (lines 49-56)

let res = (await JobService.runScriptPreviewAndWaitResult({
    ...
})) as { error?: string; columns?: { name: string; type: string }[] }[]

The response is cast without validation. If the API response format changes, this will fail silently or cause runtime errors.


10. Potential Issue with DuckDB Query Type Parameters

File: typescript-client/sqlUtils.ts (line 132)

The ducklake function doesn't support schema specification like datatable does:

export function ducklake(name: string = "main"): SqlTemplateFunction {
    return sqlProviderImpl("ducklake", { name }); // No schema parsing
}

But datatable parses name:schema syntax (line 294-308). This inconsistency might confuse users.


Summary

The most critical issues are:

  1. Issue chore: release 1.6.0 #6 (XSS-like): Column name escaping in type injection
  2. Issue Better error for usernames being only allowed to be alphanumerical #4 (Race condition): Model disposal during async operations
  3. Issue Typo on site homepage #2 (Logic bug): SQL escape sequence handling

The other issues are lower priority but worth addressing for robustness.

@diegoimbert diegoimbert marked this pull request as ready for review December 19, 2025 23:18
@claude
Copy link
Contributor

claude bot commented Dec 19, 2025

Claude finished @diegoimbert's task in 2m 44s —— View job


PR Review: Type-checked Data Tables

  • Analyze backend parser changes (Rust)
  • Analyze frontend changes (Svelte/TypeScript)
  • Review new SQL type plugin worker
  • Review client library changes
  • Check for edge cases and potential bugs
  • Provide comprehensive feedback

Summary

This PR implements type-checked data tables by:

  1. Extending the TypeScript parser (Rust) to extract SQL query details including spans, source names, and schemas
  2. Creating a custom Monaco TypeScript worker that injects type annotations into SQL tagged template literals
  3. Adding a SQL type service to communicate query type information between the editor and worker
  4. Extending the typescript-client with improved SQL template functions and type inference

The implementation is well-structured with good test coverage for the core functionality. I've added inline comments for specific issues found during the review.


Key Issues Found

High Priority:

  1. Memory leak risk in sqlTypePlugin.worker.js: The Map caches have no eviction mechanism
  2. Race condition in sqlTypeService.ts: Model could be disposed during async operations

Medium Priority:
3. SQL escape sequence handling in sql_utils.rs: Uses backslash escape instead of SQL-standard doubled quotes
4. Silent error swallowing in asset_parser.rs: SQL parsing errors are silently ignored
5. Type assertion without validation in infer.svelte.ts: API response cast without validation

Low Priority:
6. API inconsistency: ducklake() doesn't support schema specification like datatable() does
7. Hardcoded timeouts in sqlTypeService.ts: May be insufficient on slower machines
8. Missing undefined check in MapResource: Fetcher could return undefined values


Positive Observations

  1. Good test coverage: The getQueryStmtCountHeuristic function has thorough test cases covering edge cases
  2. Well-structured worker: The SqlAwareTypeScriptWorker properly handles position mapping between original and transformed code
  3. Clean API design: The datatable() and ducklake() template functions provide a nice developer experience
  4. Proper schema support: The TypeScript parser correctly handles schema prefixes in datatable names

Testing Instructions

To verify the type-checked data tables feature:

Navigate to the Script Editor and create a new TypeScript (Bun) script. Import windmill and use the wmill.datatable() function to create a SQL template. Write a SQL query like sql`SELECT id, name FROM users` and call .fetchOne() on it. After a brief moment, you should see TypeScript type annotations appear for the query result - hover over the variable to see the inferred column types (e.g., { id: number; name: string }). If the SQL query is invalid, you should see a TypeScript error highlighting the query. Try modifying the query to select different columns and verify the types update accordingly. Also test with schema prefixes like wmill.datatable('main:public') to ensure schema-qualified queries work correctly.

Comment on lines +50 to +56
this._sqlQueriesByFile = new Map()
// Map of file URI -> version number (incremented when SQL queries change)
this._fileVersions = new Map()
// Cache of transformed code and offset maps per file version
// Structure: fileUri -> {version, originalText, transformed, offsetMap, offsetMapEntries}
this._transformedCodeCache = new Map()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak: These three Map instances (_sqlQueriesByFile, _fileVersions, _transformedCodeCache) have no cache eviction mechanism. If users edit many files over a long session, these maps could grow unbounded. Consider implementing an LRU cache or cleaning up entries when files are closed.

Suggested change
this._sqlQueriesByFile = new Map()
// Map of file URI -> version number (incremented when SQL queries change)
this._fileVersions = new Map()
// Cache of transformed code and offset maps per file version
// Structure: fileUri -> {version, originalText, transformed, offsetMap, offsetMapEntries}
this._transformedCodeCache = new Map()
}
// Map of file URI -> SQL query details
this._sqlQueriesByFile = new Map()
// Map of file URI -> version number (incremented when SQL queries change)
this._fileVersions = new Map()
// Cache of transformed code and offset maps per file version
// Structure: fileUri -> {version, originalText, transformed, offsetMap, offsetMapEntries}
// TODO: Consider implementing LRU eviction to prevent memory growth
this._transformedCodeCache = new Map()


// Handle string literals (single or double quotes)
if !in_line_comment && !in_block_comment {
if (c == '\'' || c == '"') && prev_char != '\\' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL escape sequence handling: The backslash escape check (prev_char != '\\') is not correct for standard SQL. In standard SQL (and PostgreSQL without E'' strings), quotes are escaped by doubling them (''), not with backslash.

While this won't cause issues in most cases since strings containing \' are rare, it could cause incorrect behavior with escape strings like E'path\\data' in PostgreSQL where the backslash has special meaning.

// This method is added by our sqlTypePlugin.worker.js
if (typeof worker.updateSqlQueries === 'function') {
await worker.updateSqlQueries(uriString, queries)
revalidateModel(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: The model could be disposed between the null check on line 100 and the call to revalidateModel here. While revalidateModel does check for disposal internally, the model could theoretically be disposed and recreated (with a different URI) in between async operations. Consider adding defensive programming:

Suggested change
revalidateModel(model)
await worker.updateSqlQueries(uriString, queries)
// Re-fetch the model in case it was recreated during async operation
const currentModel = editor.getModel(uri)
if (currentModel) {
revalidateModel(currentModel)
}

Comment on lines 281 to 283
}
_ => {}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent error swallowing: SQL parsing errors are silently ignored here. While this may be intentional to avoid breaking the parser for malformed SQL, it makes debugging harder. Consider at least logging the error:

Suggested change
}
_ => {}
}
Err(e) => {
// Log SQL parsing error for debugging, but don't fail the asset parsing
tracing::debug!("Failed to parse SQL query: {:?}", e);
}

Comment on lines +149 to +151
for (const key of Object.keys(toFetch)) {
let value = fetchedData[key]
obj[key] = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing value check: If the fetcher fails to return data for some keys, undefined values will be stored in the result object. This could cause unexpected behavior downstream. Consider filtering out undefined values or logging a warning:

Suggested change
for (const key of Object.keys(toFetch)) {
let value = fetchedData[key]
obj[key] = value
for (const key of Object.keys(toFetch)) {
let value = fetchedData[key]
if (value !== undefined) {
obj[key] = value
}
}

Comment on lines +51 to +58
let res = (await JobService.runScriptPreviewAndWaitResult({
workspace: getWorkspace()!,
requestBody: {
language: 'postgresql',
content: queryContent,
args: { database: `datatable://${chunk[0][1]?.source_name}` }
}
})) as { error?: string; columns?: { name: string; type: string }[] }[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type assertion without validation: The response from JobService.runScriptPreviewAndWaitResult is cast without validation. If the API response format changes or returns an unexpected structure, this will cause runtime errors. Consider adding validation or defensive checks:

Suggested change
let res = (await JobService.runScriptPreviewAndWaitResult({
workspace: getWorkspace()!,
requestBody: {
language: 'postgresql',
content: queryContent,
args: { database: `datatable://${chunk[0][1]?.source_name}` }
}
})) as { error?: string; columns?: { name: string; type: string }[] }[]
})) as unknown
// Validate response structure
const res = Array.isArray(rawRes)
? (rawRes as { error?: string; columns?: { name: string; type: string }[] }[])
: []

Comment on lines +36 to +38
const WORKER_INIT_TIMEOUT = 10000
const MAX_RETRIES = 10
const RETRY_DELAY = 300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded timeout values: These timeout values are hardcoded and might be insufficient on slower machines or under heavy load. Consider making these configurable or documenting the expected initialization time:

Suggested change
const WORKER_INIT_TIMEOUT = 10000
const MAX_RETRIES = 10
const RETRY_DELAY = 300
// Worker initialization constants - adjust if users report timeout issues
const WORKER_INIT_TIMEOUT = 10000
const MAX_RETRIES = 10
const RETRY_DELAY = 300

@rubenfiszel rubenfiszel merged commit 3affbb3 into main Dec 20, 2025
139 checks passed
@rubenfiszel rubenfiszel deleted the di/typechecked-datatables branch December 20, 2025 08:51
@github-actions github-actions bot locked and limited conversation to collaborators Dec 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants