Skip to content

Conversation

@Ris-1kd
Copy link
Collaborator

@Ris-1kd Ris-1kd commented Dec 8, 2025

Summary by Sourcery

Introduce a Python taint analysis checker for Tornado web handlers and supporting utilities to detect and register request-derived data as taint sources and HTTP handler methods as entrypoints.

New Features:

  • Add a Tornado-specific Python taint checker that discovers Tornado routes, registers handler methods as entrypoints, and tracks request data as tainted sources across files.
  • Introduce shared Tornado utilities for resolving imports, parsing route definitions, and extracting handler parameters used by the taint checker.
  • Wire the new Tornado taint checker into the checker configuration so it can be invoked as part of the existing analysis pipeline.

Note

Introduce a Tornado-specific Python taint checker with route/entrypoint discovery and request source modeling, update configs/rules, and harden Python analyzer/abstract checker.

  • Python Taint Analysis:
    • New Tornado checker: Add checker/taint/python/tornado-taint-checker.ts with utilities in checker/taint/python/tornado-util.ts to:
      • Parse Tornado routes from Application/add_handlers calls, resolve imports, and register handler methods as entrypoints.
      • Mark request-derived data (e.g., self.request.body, get_argument, passthroughs like decode) as PYTHON_INPUT taint sources.
      • Tag path parameters as sources at entrypoint execution.
    • Abstract checker tweak: python-taint-abstract-checker.ts now tags identifiers as sources even if preprocess isn’t ready.
  • Analyzer robustness:
    • In engine/analyzer/python/common/python-analyzer.ts, improve entrypoint de-dup key (proper param serialization, include parent class), and continue processing on entrypoint execution errors.
  • Configuration/Rules:
    • Register new checker in resource/checker/checker-config.json and add to Python packs in resource/checker/checker-pack-config.json.
    • Extend resource/example-rule-config/rule_config_python.json to include taint_flow_python_tornado_input.

Written by Cursor Bugbot for commit fbd5978. This will update automatically on new commits. Configure here.

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Adds a new Tornado-specific Python taint checker and utilities, wires it into the checker configuration, and implements route/handler discovery plus taint source propagation for Tornado request handlers.

Sequence diagram for Tornado route discovery and entrypoint registration

sequenceDiagram
  participant Analyzer
  participant TornadoTaintChecker
  participant AstUtil

  Analyzer->>TornadoTaintChecker: triggerAtCompileUnit(analyzer, scope, node, state, info)
  TornadoTaintChecker->>AstUtil: visit(node, visitors)
  AstUtil-->>TornadoTaintChecker: AssignmentExpression / VariableDeclaration / ClassDefinition
  TornadoTaintChecker->>TornadoTaintChecker: update fileCache for sourcefile
  Analyzer-->>TornadoTaintChecker: return

  Analyzer->>TornadoTaintChecker: triggerAtFuncCallSyntax(analyzer, scope, node, state, info)
  TornadoTaintChecker->>tornado_util: isTornadoCall(node, 'Application' or 'add_handlers')
  tornado_util-->>TornadoTaintChecker: boolean
  alt Tornado Application or add_handlers
    TornadoTaintChecker->>TornadoTaintChecker: collectTornadoEntrypointAndSource(analyzer, scope, state, routeList, fileName)
    TornadoTaintChecker->>TornadoTaintChecker: normalizeRoutes(routeList, fileName)
    loop each route
      TornadoTaintChecker->>tornado_util: parseRoutePair(routeNode)
      tornado_util-->>TornadoTaintChecker: RoutePair
      TornadoTaintChecker->>TornadoTaintChecker: resolveSymbol(handlerName, file)
      alt handler class found
        TornadoTaintChecker->>Analyzer: processInstruction(scope, classAst, state)
        Analyzer-->>TornadoTaintChecker: handlerSymVal (class)
        TornadoTaintChecker->>TornadoTaintChecker: emitHandlerEntrypoints(analyzer, handlerSymVal, urlPattern, classAst, scope, state)
        loop each http method (get, post, ...)
          TornadoTaintChecker->>Analyzer: processInstruction(scope, fdef, state)
          Analyzer-->>TornadoTaintChecker: fclos
          TornadoTaintChecker->>tornado_util: extractParamsFromAst(fdef)
          tornado_util-->>TornadoTaintChecker: ParamMeta[]
          TornadoTaintChecker->>TornadoTaintChecker: register ParamMeta as PYTHON_INPUT in sourceScope
          TornadoTaintChecker->>completeEntryPoint: completeEntryPoint(fclos)
          completeEntryPoint-->>TornadoTaintChecker: entryPoint
          TornadoTaintChecker->>Analyzer: push entryPoint to entryPoints
        end
      end
    end
  end
Loading

Class diagram for TornadoTaintChecker and Tornado utility types

classDiagram
  class PythonTaintAbstractChecker {
    <<abstract>>
    +checkerRuleConfigContent any
    +sourceScope any
    +getCheckerId() string
    +addSourceTagForSourceScope(kind string, sourceScope any) void
    +addSourceTagForcheckerRuleConfigContent(kind string, checkerRuleConfigContent any) void
    +triggerAtIdentifier(analyzer any, scope any, node any, state any, info any) void
    +triggerAtFunctionCallAfter(analyzer any, scope any, node any, state any, info any) void
    +checkByNameMatch(node any, fclos any, argvalues any) void
  }

  class TornadoTaintChecker {
    -fileCache Map~string, FileCache~
    +TornadoTaintChecker(resultManager any)
    -markAsTainted(value any) void
    +triggerAtStartOfAnalyze(analyzer any, scope any, node any, state any, info any) void
    +triggerAtCompileUnit(analyzer any, scope any, node any, state any, info any) bool
    +triggerAtFuncCallSyntax(analyzer any, scope any, node any, state any, info any) bool
    +triggerAtIdentifier(analyzer any, scope any, node any, state any, info any) void
    +checkByNameMatch(node any, fclos any, argvalues any) void
    +triggerAtFunctionCallAfter(analyzer any, scope any, node any, state any, info any) void
    +triggerAtMemberAccess(analyzer any, scope any, node any, state any, info any) void
    -resolveSymbol(name string, currentFile string) any
    -normalizeRoutes(node any, currentFile string) RoutePair[]
    -collectTornadoEntrypointAndSource(analyzer any, scope any, state any, routeList any, currentFile string) void
    -emitHandlerEntrypoints(analyzer any, handlerSymVal any, urlPattern string, classAst any, scope any, state any) void
    -buildClassSymbol(classNode any) any
  }

  class ImportSymbol {
    +file string
    +originalName string
  }

  class RoutePair {
    +path string
    +handlerName string
    +file string
  }

  class FileCache {
    +vars Map~string, any~
    +classes Map~string, any~
    +importedSymbols Map~string, ImportSymbol~
  }

  class ParamMeta {
    +name string
    +locStart number or 'all'
    +locEnd number or 'all'
  }

  class tornado_util {
    +tornadoSourceAPIs Set~string~
    +passthroughFuncs Set~string~
    +isRequestAttributeAccess(node any) boolean
    +isRequestAttributeExpression(expr any) boolean
    +isTornadoCall(node any, targetName string) boolean
    +parseRoutePair(route any) RoutePair
    +resolveImportPath(modulePath string, currentFile string) string
    +extractImportEntries(stmt any) Array~any~
    +extractParamsFromAst(funcNode any) ParamMeta[]
  }

  TornadoTaintChecker --|> PythonTaintAbstractChecker

  TornadoTaintChecker ..> FileCache : uses
  TornadoTaintChecker ..> RoutePair : uses
  TornadoTaintChecker ..> tornado_util : uses helpers
  TornadoTaintChecker ..> ParamMeta : uses
  FileCache ..> ImportSymbol : contains
Loading

File-Level Changes

Change Details Files
Introduce Tornado-specific Python taint checker implementing Tornado route discovery and entrypoint/source registration.
  • Create TornadoTaintChecker class extending PythonTaintAbstractChecker with checker id taint_flow_python_tornado_input.
  • Implement trigger hooks (start, compile unit, function call, identifier, member access) to reload rule config, collect routes from Application/add_handlers calls, and tag Tornado request data as PYTHON_INPUT sources.
  • Build per-file caches of variables, classes, and imported symbols to resolve handler classes across files and normalize complex route lists into RoutePair structures.
  • Emit entrypoints for HTTP methods on handler classes and register their non-self parameters as PYTHON_INPUT sources, integrating with completeEntryPoint and analyzer.entryPoints.
  • Extend sink matching to support partial function-name matches when applying FuncCallTaintSink rules.
src/checker/taint/python/tornado-taint-checker.ts
Add shared Tornado utility module for route parsing, import resolution, and parameter metadata extraction.
  • Define RoutePair, FileCache, ImportSymbol, and ParamMeta interfaces for Tornado analysis.
  • Implement helpers to recognize Tornado calls, parse route tuples/url() helpers into RoutePair, and detect self.request.* attribute/expressions.
  • Provide resolveImportPath and extractImportEntries helpers to resolve relative module paths and imported symbols.
  • Implement extractParamsFromAst to normalize function parameter metadata with location info.
src/checker/taint/python/tornado-util.ts
Wire Tornado checker into existing checker configuration.
  • Register the new Tornado taint checker in checker-config.json and checker-pack-config.json so it can be enabled via rules.
  • Ensure ruleConfig reloading and checkerIds matching can pick up taint_flow_python_tornado_input rules at analysis start.
resource/checker/checker-config.json
resource/checker/checker-pack-config.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Ris-1kd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new taint analysis checker specifically designed for Python applications built with the Tornado web framework. The primary goal is to enhance the security analysis capabilities for Tornado projects by accurately identifying application entry points and marking various forms of user-controlled input as potential sources of taint. It also incorporates advanced features like cross-file symbol resolution and intelligent taint propagation through common utility functions, leading to more precise and comprehensive security vulnerability detection.

Highlights

  • New Tornado Taint Checker: A dedicated taint analysis checker has been added for Python applications utilizing the Tornado web framework, enabling specialized security analysis for these projects.
  • Automated Entry Point Detection: The checker automatically identifies Tornado route handlers, such as those defined via Application() and add_handlers(), as entry points for analysis.
  • Comprehensive Source Identification: Various forms of user input in Tornado, including request arguments (get_argument, get_query_argument), request body, query parameters, headers, and cookies (self.request.body, self.request.query), are now marked as PYTHON_INPUT sources.
  • Cross-File Symbol Resolution: A new file caching mechanism is implemented to resolve variables, classes, and imported symbols across different Python files, significantly improving the accuracy of taint flow analysis.
  • Taint Propagation through Passthrough Functions: The checker includes logic to correctly propagate taint through common string manipulation and utility functions (e.g., decode, strip), ensuring that sanitized inputs are not prematurely untainted.
  • Configuration Integration: The new Tornado checker has been integrated into the existing checker configuration and relevant checker packs, making it readily available for use.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The logic that detects self.request.<attr> as a source is duplicated between triggerAtMemberAccess and isRequestAttributeAccess/isRequestAttributeExpression in tornado-util.ts; consider reusing the utility helpers in the checker to keep behavior consistent and reduce maintenance overhead.
  • In triggerAtStartOfAnalyze you re-parse process.argv and reload the rule config on every analyze-start; if this hook is called frequently, it may be worth caching the resolved ruleConfigFile and loaded rule data to avoid repeated file I/O and argument parsing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The logic that detects `self.request.<attr>` as a source is duplicated between `triggerAtMemberAccess` and `isRequestAttributeAccess`/`isRequestAttributeExpression` in `tornado-util.ts`; consider reusing the utility helpers in the checker to keep behavior consistent and reduce maintenance overhead.
- In `triggerAtStartOfAnalyze` you re-parse `process.argv` and reload the rule config on every analyze-start; if this hook is called frequently, it may be worth caching the resolved `ruleConfigFile` and loaded rule data to avoid repeated file I/O and argument parsing.

## Individual Comments

### Comment 1
<location> `src/checker/taint/python/tornado-taint-checker.ts:360-361` </location>
<code_context>
+    }
+
+    // 检查是否是 tornado source API 调用(如 get_argument)
+    if (funcName && tornadoSourceAPIs.has(funcName)) {
+      this.markAsTainted(ret)
+    }
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Treating any get_argument-like call as a source ignores receiver context and may create false positives.

The check currently only matches on `funcName`, so calls like `some_other_obj.get_argument()` (or any monkey‑patched function named `get_argument`) will be treated as taint sources even when unrelated to Tornado handlers. To avoid these false positives, also validate the receiver (e.g., ensure it’s `self`, a known handler type, or otherwise consistent with Tornado request handlers) before marking the return value as tainted.
</issue_to_address>

### Comment 2
<location> `src/checker/taint/python/tornado-taint-checker.ts:371-361` </location>
<code_context>
+      // node.callee.object.type = 'MemberAccess' (body)
+      // node.callee.object.object.type = 'MemberAccess' (request)
+      // node.callee.object.object.object.name = 'self'
+      if (node.callee?.type === 'MemberAccess' && node.callee.object) {
+        const bodyNode = node.callee.object
+        if (
+          bodyNode.type === 'MemberAccess' &&
+          bodyNode.property?.name === 'body' &&
+          bodyNode.object?.type === 'MemberAccess' &&
+          bodyNode.object.property?.name === 'request' &&
+          bodyNode.object.object?.name === 'self'
+        ) {
+          // 直接标记返回值为 source(因为 self.request.body 是 source)
+          this.markAsTainted(ret)
+          return // 已经标记,不需要再检查 receiver
+        }
</code_context>

<issue_to_address>
**suggestion:** The special-case for self.request.body duplicates the logic in isRequestAttributeExpression and is brittle.

Since `tornado-util.ts` already provides `isRequestAttributeExpression` for recognizing Tornado request attributes (including call-expression forms), prefer using that helper here instead of duplicating the AST pattern for `self.request.body`. Centralizing this logic will keep it consistent and easier to evolve.
</issue_to_address>

### Comment 3
<location> `src/checker/taint/python/tornado-taint-checker.ts:268-269` </location>
<code_context>
+            valEnd === 'all' &&
+            val.scopeFile !== 'all' &&
+            val.scopeFunc === 'all' &&
+            typeof node.loc?.sourcefile === 'string' &&
+            node.loc.sourcefile.includes(val.scopeFile)
+          ) {
+            shouldMark = true
</code_context>

<issue_to_address>
**issue (bug_risk):** Using substring matching for scopeFile may accidentally match unrelated files.

Because `node.loc.sourcefile.includes(val.scopeFile)` does a raw substring match, a rule scoped to `user.py` will also trigger for `old_user.py`, `user.py.bak`, or any path containing that text. If `scopeFile` represents a file path, consider normalizing both paths and comparing full path segments (or using a suffix match that respects path separators) to avoid unintended matches.
</issue_to_address>

### Comment 4
<location> `src/checker/taint/python/tornado-util.ts:120-121` </location>
<code_context>
+    const { callee } = route
+    const isUrlHelper =
+      (callee.type === 'Identifier' && callee.name === 'url') ||
+      (callee.type === 'MemberAccess' &&
+        AstUtil.prettyPrint(callee).includes('url'))
+    if (isUrlHelper && Array.isArray(route.arguments)) {
+      const [first, second] = route.arguments
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Detecting url-helper calls via prettyPrint string search is fragile and potentially slow.

Using `AstUtil.prettyPrint(callee).includes('url')` risks false positives (any member chain whose printed form contains "url") and adds unnecessary work in large, route-heavy files. Prefer matching specific AST node shapes and names (e.g., `Identifier`/`MemberAccess` paths for known helpers like `some_module.url` or `tornado.web.url`) instead of searching the pretty-printed text.

Suggested implementation:

```typescript
  } else if (route.type === 'CallExpression' && route.callee) {
    const { callee } = route

    const isIdentifierUrlHelper =
      callee.type === 'Identifier' && callee.name === 'url'

    const isMemberAccessUrlHelper =
      callee.type === 'MemberAccess' &&
      (
        // e.g. something.url(...)
        (callee.member &&
          callee.member.type === 'Identifier' &&
          callee.member.name === 'url') ||
        // or if the AST uses a `property` field instead of `member`
        (callee.property &&
          callee.property.type === 'Identifier' &&
          callee.property.name === 'url')
      )

    const isUrlHelper = isIdentifierUrlHelper || isMemberAccessUrlHelper

    if (isUrlHelper && Array.isArray(route.arguments)) {
      const [first, second] = route.arguments
      pathExpr = first
      handlerNode = second
    }
  }

```

If your AST representation for member access uses different field names or nesting (for example, `callee.attr` or nested `MemberAccess` chains for `tornado.web.url`), you should adjust the `isMemberAccessUrlHelper` condition accordingly to:
1. Refer to the correct field(s) holding the final attribute/identifier node.
2. Optionally check the full member chain (e.g., that the object path matches known helpers such as `tornado.web`), if you want stricter matching.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


// 检查是否是 tornado source API 调用(如 get_argument)
if (funcName && tornadoSourceAPIs.has(funcName)) {
this.markAsTainted(ret)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The special-case for self.request.body duplicates the logic in isRequestAttributeExpression and is brittle.

Since tornado-util.ts already provides isRequestAttributeExpression for recognizing Tornado request attributes (including call-expression forms), prefer using that helper here instead of duplicating the AST pattern for self.request.body. Centralizing this logic will keep it consistent and easier to evolve.

Comment on lines 268 to 269
typeof node.loc?.sourcefile === 'string' &&
node.loc.sourcefile.includes(val.scopeFile)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using substring matching for scopeFile may accidentally match unrelated files.

Because node.loc.sourcefile.includes(val.scopeFile) does a raw substring match, a rule scoped to user.py will also trigger for old_user.py, user.py.bak, or any path containing that text. If scopeFile represents a file path, consider normalizing both paths and comparing full path segments (or using a suffix match that respects path separators) to avoid unintended matches.

Comment on lines 120 to 121
(callee.type === 'MemberAccess' &&
AstUtil.prettyPrint(callee).includes('url'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Detecting url-helper calls via prettyPrint string search is fragile and potentially slow.

Using AstUtil.prettyPrint(callee).includes('url') risks false positives (any member chain whose printed form contains "url") and adds unnecessary work in large, route-heavy files. Prefer matching specific AST node shapes and names (e.g., Identifier/MemberAccess paths for known helpers like some_module.url or tornado.web.url) instead of searching the pretty-printed text.

Suggested implementation:

  } else if (route.type === 'CallExpression' && route.callee) {
    const { callee } = route

    const isIdentifierUrlHelper =
      callee.type === 'Identifier' && callee.name === 'url'

    const isMemberAccessUrlHelper =
      callee.type === 'MemberAccess' &&
      (
        // e.g. something.url(...)
        (callee.member &&
          callee.member.type === 'Identifier' &&
          callee.member.name === 'url') ||
        // or if the AST uses a `property` field instead of `member`
        (callee.property &&
          callee.property.type === 'Identifier' &&
          callee.property.name === 'url')
      )

    const isUrlHelper = isIdentifierUrlHelper || isMemberAccessUrlHelper

    if (isUrlHelper && Array.isArray(route.arguments)) {
      const [first, second] = route.arguments
      pathExpr = first
      handlerNode = second
    }
  }

If your AST representation for member access uses different field names or nesting (for example, callee.attr or nested MemberAccess chains for tornado.web.url), you should adjust the isMemberAccessUrlHelper condition accordingly to:

  1. Refer to the correct field(s) holding the final attribute/identifier node.
  2. Optionally check the full member chain (e.g., that the object path matches known helpers such as tornado.web), if you want stricter matching.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new taint analysis checker for the Python Tornado framework. The implementation is comprehensive, covering route discovery, entry point registration, and taint source tracking from request data. My review focuses on improving cross-file symbol resolution, code structure, and configuration handling. I've identified a high-severity issue with how absolute imports are resolved, which could impact the checker's effectiveness. I've also included a couple of medium-severity suggestions to improve maintainability and performance.

Comment on lines 142 to 157
export function resolveImportPath(
modulePath: string,
currentFile: string,
): string | null {
if (!modulePath) return null
const currentDir = path.dirname(currentFile)
const leadingDots = modulePath.match(/^\.+/)?.[0] ?? ''
let baseDir = currentDir
if (leadingDots.length > 0) {
baseDir = path.resolve(currentDir, '../'.repeat(leadingDots.length - 1))
}
const remainder = modulePath.slice(leadingDots.length)
const normalized = remainder ? remainder.split('.').join(path.sep) : ''
const resolved = normalized ? path.resolve(baseDir, normalized) : baseDir
return `${resolved}.py`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of resolveImportPath does not correctly handle absolute Python imports. It treats them as relative to the current file's directory, which is incorrect. Absolute imports should be resolved from the project's source root(s). This can lead to failures in resolving symbols and incomplete taint analysis.

I suggest modifying the function to accept an optional mainDir (project root) and use it as the base for absolute imports.

Additionally, the current implementation doesn't handle package imports (directories with __init__.py). While the suggested change doesn't fully solve that, it's a step in the right direction. You might consider enhancing it further to check for __init__.py files.

You will also need to update the call site in tornado-taint-checker.ts to pass Config.maindir.
For example, at src/checker/taint/python/tornado-taint-checker.ts:148:

const resolved = resolveImportPath(modulePath, fileName, Config.maindir);
export function resolveImportPath(
  modulePath: string,
  currentFile: string,
  mainDir?: string,
): string | null {
  if (!modulePath) return null
  const currentDir = path.dirname(currentFile)
  const leadingDots = modulePath.match(/^\.+/)?.[0] ?? ''
  let baseDir: string
  if (leadingDots.length > 0) {
    baseDir = path.resolve(currentDir, '../'.repeat(leadingDots.length - 1))
  } else if (mainDir) {
    baseDir = mainDir
  } else {
    // Fallback for absolute imports when mainDir is not provided.
    // This is the original behavior and is likely incorrect.
    baseDir = currentDir
  }
  const remainder = modulePath.slice(leadingDots.length)
  const normalized = remainder ? remainder.split('.').join(path.sep) : ''
  const resolved = normalized ? path.resolve(baseDir, normalized) : baseDir
  return `${resolved}.py`
}

Comment on lines 66 to 76
if (!ruleConfigFile || ruleConfigFile === '') {
const args = process.argv
const ruleConfigIndex = args.indexOf('--ruleConfigFile')
if (ruleConfigIndex >= 0 && ruleConfigIndex < args.length - 1) {
ruleConfigFile = args[ruleConfigIndex + 1]
const path = require('path')
ruleConfigFile = path.isAbsolute(ruleConfigFile)
? ruleConfigFile
: path.resolve(process.cwd(), ruleConfigFile)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to load the ruleConfigFile by parsing process.argv is fragile and considered a code smell. It indicates a potential issue with configuration management or initialization order, as the comment on line 62 suggests. This logic appears to be duplicated across different checkers and makes them less reliable and harder to test.

It would be better to refactor the configuration handling to ensure that Config.ruleConfigFile is fully resolved and available before the checker is instantiated, removing the need for this manual argument parsing.

}
// 确保 finalEp 有 filePath
if (!finalEp.filePath && finalEp.fdef?.loc?.sourcefile) {
const FileUtil = require('../../../util/file-util')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several require calls inside methods (e.g., triggerAtStartOfAnalyze, emitHandlerEntrypoints). This is inefficient as the module will be re-evaluated on every call (or at least checked in the cache), and it's generally better for readability and maintainability to have all imports at the top of the file.

Please move all require calls to the top of the file. This applies to:

  • require('../../common/rules-basic-handler') on line 63
  • require('path') on line 71
  • require('../../../util/file-util') on line 80 and 683
  • require('../../../util/common-util') on line 94

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Comment on lines +30 to +35
if (!value) return
if (!value._tags) {
value._tags = new Set()
}
value._tags.add('PYTHON_INPUT')
value.hasTagRec = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!value) return
if (!value._tags) {
value._tags = new Set()
}
value._tags.add('PYTHON_INPUT')
value.hasTagRec = true
IntroduceTaint.setTaint(value, 'PYTHON_INPUT')

This logic already exists in source-util.ts

* @param info
*/
triggerAtStartOfAnalyze(analyzer: any, scope: any, node: any, state: any, info: any): void {
// 重新加载规则配置(因为可能在构造函数时还没有设置 ruleConfigFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assert is not true.
Since TornadoTaintChecker extends Checker, it will load rule config to this.checkerRuleConfigContent in constructor

constructor(resultManager: any, checkerId: any) {
this.checkerId = checkerId
this.resultManager = resultManager
this.checkerRuleConfigContent = {}
initRules()
this.loadRuleConfig(this)
}

At the point where triggerAtStartOfAnalyze is called, the constructor must have been called, so IMO the code loading ruleConfig again in this function is redundant

routeList: any,
currentFile: string
) {
const processed = new Set<string>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a member of class instead of a local variable of a function i think

* @param state
* @param _info
*/
triggerAtFuncCallSyntax(analyzer: any, scope: any, node: any, state: any, _info: any): boolean | undefined {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would favor a Symbol-Interpret approach instead of AST-based approach. Here we can use triggerAtFunctionCallBefore and get the symbol value of arguments (especially the routeList) without any performance tradeoff, and with the symbol value of routeList, further processing will be more accurate. For example, when the routeList itself is actually retrieved from certain function call, current normalizeRoutes implementation cannot handle this situation properly as it doesn't take CallExpression into consideration, but with symbol value of routeList, we can get the "real" route object regardless of how it is retrieved from whatever expression type.

* @param _info
*/
triggerAtCompileUnit(analyzer: any, scope: any, node: any, _state: any, _info: any): boolean | undefined {
const fileName = node.loc?.sourcefile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const fileName = node.loc?.sourcefile
if (config.entryPointMode === 'ONLY_CUSTOM') return
const fileName = node.loc?.sourcefile

Comment on lines 236 to 240
if (!res._tags) {
res._tags = new Set()
}
res._tags.add('PYTHON_INPUT')
res.hasTagRec = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!res._tags) {
res._tags = new Set()
}
res._tags.add('PYTHON_INPUT')
res.hasTagRec = true
this.markAsTainted(res)

* @param state
* @param info
*/
triggerAtIdentifier(analyzer: any, scope: any, node: any, state: any, info: any): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what does this hook do?

*/
triggerAtFunctionCallAfter(analyzer: any, scope: any, node: any, state: any, info: any): void {
// 先调用基类方法处理规则配置中的 source
super.triggerAtFunctionCallAfter(analyzer, scope, node, state, info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super.triggerAtFunctionCallAfter(analyzer, scope, node, state, info)
super.triggerAtFunctionCallAfter(analyzer, scope, node, state, info)
if (config.entryPointMode === 'ONLY_CUSTOM') return

* @param info
*/
triggerAtMemberAccess(analyzer: any, scope: any, node: any, state: any, info: any): void {
const { res } = info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const { res } = info
if (config.entryPointMode === 'ONLY_CUSTOM') return
const { res } = info

Comment on lines +28 to +34
'get_argument',
'get_query_argument',
'get_body_argument',
'get_query_arguments',
'get_body_arguments',
'get_cookie',
'get_secure_cookie',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'get_argument',
'get_query_argument',
'get_body_argument',
'get_query_arguments',
'get_body_arguments',
'get_cookie',
'get_secure_cookie',
'get_argument',
'get_query_argument',
'get_body_argument',
'get_query_arguments',
'get_body_arguments',
'get_cookie',
'get_secure_cookie',
'get_arguments',
'get_json_body'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved yet

function getRules(ruleConfigPath?: string): any[] {
// 如果传入了 ruleConfigPath,或者 config.ruleConfigFile 已设置但 rules 未加载,则重新加载
const currentRuleConfigFile = ruleConfigPath || config.ruleConfigFile
if (!rules || (currentRuleConfigFile && !rules)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant condition prevents rule reloading with new path

The condition !rules || (currentRuleConfigFile && !rules) is logically equivalent to just !rules since the second clause (currentRuleConfigFile && !rules) can only be true when !rules is already true. The comment indicates the intent was to reload rules when a new ruleConfigPath is passed, but this logic never reloads if rules are already loaded - even with a different path. The condition likely needs to be !rules || ruleConfigPath to match the stated intent.

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is a redundant condition

return elements.flatMap((element: any) =>
this.extractRoutesFromSymbolValue(element, currentFile, analyzer, scope, state)
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple route pairs never parsed due to early return

The condition at line 610 includes 'tuple' in the check (vtype === 'list' || vtype === 'tuple' || vtype === 'array'), causing all tuples to be treated as containers and immediately returning via flatMap. This makes the route pair parsing code at lines 655-671 unreachable for tuples. When processing Tornado routes like [(r"/", MainHandler)], each tuple is incorrectly split into its individual elements (path string and handler class), losing the association between them and causing route discovery to fail completely.

Additional Locations (1)

Fix in Cursor Fix in Web

function getRules(ruleConfigPath?: string): any[] {
// 如果传入了 ruleConfigPath,或者 config.ruleConfigFile 已设置但 rules 未加载,则重新加载
const currentRuleConfigFile = ruleConfigPath || config.ruleConfigFile
if (!rules || (currentRuleConfigFile && !rules)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is a redundant condition

Comment on lines +28 to +34
'get_argument',
'get_query_argument',
'get_body_argument',
'get_query_arguments',
'get_body_arguments',
'get_cookie',
'get_secure_cookie',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved yet

entryPoint.entryPointSymVal?.parent
)
} catch (e) {
console.error(`[DEBUG] Error executing entrypoint [${i}]: ${e}`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should inspect the error log for error details instead of printing the error directly onto the console

* @param _info
*/
triggerAtFuncCallSyntax(analyzer: any, scope: any, node: any, state: any, _info: any): boolean | undefined {
const fileName = node.loc?.sourcefile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not resolved yet


if (isApp) {
// Application(...) -> first arg is routes
;[routeListArgValue] = argvalues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant ;

;[routeListArgValue] = argvalues
} else if (isAddHandlers) {
// add_handlers(host, routes) -> second arg is routes
;[, routeListArgValue] = argvalues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant ;

* Flatten route lists (handles BinaryExpression +)
* @param node
* @param currentFile
* Extract route pairs from resolved symbol values (from argvalues)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic is too complex, I think several member access should be enough...

if (receiver && (receiver.taint || receiver.hasTagRec || receiver._tags?.has('PYTHON_INPUT'))) {
this.markAsTainted(ret)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ONLY_CUSTOM check in triggerAtFunctionCallAfter

The triggerAtFunctionCallAfter method is missing the Config.entryPointMode === 'ONLY_CUSTOM' check that exists in both triggerAtCompileUnit (line 164) and triggerAtFunctionCallBefore (line 236). This inconsistency means that when entryPointMode is set to ONLY_CUSTOM, the Tornado-specific source API detection (like get_argument, get_cookie) and passthrough function handling will still execute, which is likely unintended behavior. The check should be added after the super call but before processing the fclos and ret values.

Fix in Cursor Fix in Web

const httpMethods = new Set(['get', 'post', 'put', 'delete', 'patch', 'head', 'options'])
const entrypoints = Object.entries(handlerSymVal.value)
.filter(([key, value]: [string, any]) => httpMethods.has(key) && value.vtype === 'fclos')
.map(([, value]: [string, any]) => value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential crash when handlerSymVal.value is undefined

The emitHandlerEntrypoints method calls Object.entries(handlerSymVal.value) after only checking that handlerSymVal.vtype === 'class'. When pair.handlerSymVal with vtype === 'class' is used directly (lines 315-317) without going through processHandlerClass or buildClassSymbol, the value property may be undefined, causing Object.entries(undefined) to throw a TypeError. The code ensures handlerSymVal.field exists (line 341-343) but not handlerSymVal.value.

Additional Locations (1)

Fix in Cursor Fix in Web

// 如果有匹配的规则,调用基类方法处理
if (matchedRule) {
super.checkByNameMatch(node, fclos, argvalues)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial matching in checkByNameMatch is ineffective

The checkByNameMatch override is documented to "support partial matching (e.g., os.system matches syslib_from.os.system)" but this feature doesn't work. The method uses string-based partial matching (callFull.endsWith('.os.system')) to find a matching rule, then delegates to super.checkByNameMatch which uses AST-based matchField that requires exact path matching. When a partial match is found, super's AST matching will fail because it doesn't support partial matching - for a call like mymodule.os.system with rule fsig: "os.system", the AST matcher walks the full path and fails when encountering the extra prefix. This results in partial matches never producing findings, defeating the stated purpose of the override.

Fix in Cursor Fix in Web

}
}
// Include parent class name in key to distinguish handlers with same method name
const parentName = entryPoint?.entryPointSymVal?.parent?.id || entryPoint?.entryPointSymVal?.parent?.name || ''
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parentName may serialize to [object Object] in key

The parentName extraction uses parent?.id || parent?.name directly, but in the AST these properties are often objects with a name sub-property (e.g., {type: 'Identifier', name: 'MyClass'}), not strings. When parent.id is an object, it's truthy so used as-is, and when concatenated into entryKey it becomes [object Object]. This causes different handler classes to produce identical keys, making the deduplication logic skip valid entrypoints. The params serialization nearby correctly uses p?.id?.name || p?.id pattern, but parentName doesn't follow this pattern.

Fix in Cursor Fix in Web

this.markAsTainted(ret)
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ONLY_CUSTOM mode check in triggerAtFunctionCallAfter

The triggerAtFunctionCallAfter method lacks the if (Config.entryPointMode === 'ONLY_CUSTOM') return check that exists in triggerAtCompileUnit, triggerAtFunctionCallBefore, and triggerAtMemberAccess. This inconsistency means tornado-specific taint marking (for APIs like get_argument and passthrough functions) will still run in ONLY_CUSTOM mode, while all other tornado-specific behaviors are disabled. The PR discussion explicitly flagged this as "not resolved yet".

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants