Fix false negatives in hardcoded password detection (B105/B106)#1385
Fix false negatives in hardcoded password detection (B105/B106)#1385GhostbyteX-740 wants to merge 2 commits intoPyCQA:mainfrom
Conversation
The B105 and B106 rules missed hardcoded credentials in three
common patterns due to rigid regex matching and incomplete AST
traversal:
- B105 did not handle dictionary subscript access in comparisons
(e.g., `data['password'] == 'secret'`). The Compare branch only
checked ast.Name and ast.Attribute nodes, silently skipping
ast.Subscript. A new branch was added to inspect the subscript
slice and match it against RE_CANDIDATES.
- B105 did not detect credentials assigned to variables using
camelCase or SCREAMING_CASE naming (e.g., `DATABASEPASSWORD`).
RE_CANDIDATES required word boundaries via underscores or start/
end anchors, so names without underscores never matched. A plain
substring alternative `|{0}` was added to RE_CANDIDATES so any
name containing a credential word is caught regardless of casing
convention. The re.IGNORECASE flag handles case differences.
- B106 did not flag keyword arguments using `api_key` or similar
names because `key` was absent from RE_WORDS. Added `key` to the
alternation group so `api_key`, `API_KEY`, and similar identifiers
are now recognised as credential-related argument names.
Resolves: PyCQA#1383
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Updates Bandit’s hardcoded credential detectors (B105/B106) to reduce false negatives in common identifier and AST patterns (e.g., dict-subscript comparisons and api_key-style names).
Changes:
- Expanded the credential-word regex to include
key, and broadened candidate matching. - Added normalization for
ast.Assigntargets to better catch camelCase variable names. - Extended B105 Compare handling to detect
ast.Subscripton the left-hand side (e.g.,data["password"] == "secret").
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RE_WORDS = "(pas+wo?r?d|pass(phrase)?|pwd|token|secrete?|key)" | ||
| RE_CANDIDATES = re.compile( | ||
| "(^{0}$|_{0}_|^{0}_|_{0}$)".format(RE_WORDS), re.IGNORECASE | ||
| "(^{0}$|_{0}_|^{0}_|_{0}$|{0})".format(RE_WORDS), re.IGNORECASE |
There was a problem hiding this comment.
RE_CANDIDATES is now an unbounded substring match (|{0}) and RE_WORDS now includes key. This combination will dramatically increase false positives because it matches credential words inside unrelated identifiers and dict keys (e.g., a dict literal with key 'key' or variables like keyfile, monkey, compass, etc.). Since B105/B106/B107 all share this regex, it will also affect unrelated examples and likely break existing functional expectations.
Suggestion: keep boundary-based matching for common/short terms (especially pass and key), and if you need to catch patterns like DATABASEPASSWORD, add a separate “strong word” substring check (e.g., only for password|passphrase|token|secret) or normalize identifiers (camelCase -> snake_case) and apply the existing boundary regex. Also consider scoping key to B106/B107 only (keyword args/defaults) rather than B105 dict/variable names to reduce noise.
| "(^{0}$|_{0}_|^{0}_|_{0}$|{0})".format(RE_WORDS), re.IGNORECASE | |
| "(^{0}$|_{0}_|^{0}_|_{0}$)".format(RE_WORDS), re.IGNORECASE |
| if isinstance(targ, ast.Name): | ||
| normalized = re.sub( | ||
| r"(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", |
There was a problem hiding this comment.
This normalized = re.sub(...).lower() line exceeds the repo’s Black limit (pre-commit runs Black with --line-length=79), so it will fail formatting checks. Please reformat it (and ideally extract the normalization regex/pattern into a module-level constant or small helper) to keep the Assign-target logic readable.
| if isinstance( | ||
| comp.comparators[0], ast.Constant | ||
| ) and isinstance(comp.comparators[0].value, str): | ||
| return _report(comp.comparators[0].value) | ||
| elif isinstance(comp.left, ast.Subscript): | ||
| slc = comp.left.slice | ||
| if isinstance(slc, ast.Constant) and isinstance(slc.value, str): | ||
| if RE_CANDIDATES.search(slc.value): |
There was a problem hiding this comment.
The new ast.Subscript Compare handling introduces new detection behavior but there’s no corresponding update to the repo’s existing functional coverage for hardcoded-password detection (tests currently assert fixed issue counts). Please add/adjust a test case (e.g., in examples/hardcoded-passwords.py plus expected counts, or a targeted unit test) that exercises data['password'] == 'secret' so regressions don’t reappear.
| and isinstance(assign.value.value, str) | ||
| ): | ||
| return _report(assign.value.value) | ||
|
|
||
| elif isinstance(node._bandit_parent, ast.Compare): | ||
| # looks for "candidate == 'some_string'" | ||
| comp = node._bandit_parent | ||
| if isinstance(comp.left, ast.Name): | ||
| if RE_CANDIDATES.search(comp.left.id): | ||
| if isinstance( | ||
| comp.comparators[0], ast.Constant | ||
| ) and isinstance(comp.comparators[0].value, str): | ||
| return _report(comp.comparators[0].value) | ||
| elif isinstance(comp.left, ast.Attribute): | ||
| if RE_CANDIDATES.search(comp.left.attr): | ||
| if isinstance( | ||
| comp.comparators[0], ast.Constant | ||
| ) and isinstance(comp.comparators[0].value, str): | ||
| return _report(comp.comparators[0].value) | ||
| elif isinstance(comp.left, ast.Subscript): | ||
| slc = comp.left.slice | ||
| if isinstance(slc, ast.Constant) and isinstance(slc.value, str): | ||
| if RE_CANDIDATES.search(slc.value): |
There was a problem hiding this comment.
The ast.Subscript Compare branch duplicates the same “candidate name check + first comparator constant string” pattern used in the ast.Name and ast.Attribute branches. Consider factoring this repeated logic into a small helper (e.g., something that extracts the candidate identifier from comp.left and returns the constant comparator value) to reduce future drift between branches and make it easier to add other left-hand-side node types.
The B105 and B106 rules missed hardcoded credentials in three common patterns due to rigid regex matching and incomplete AST traversal:
B105 did not handle dictionary subscript access in comparisons (e.g.,
data['password'] == 'secret'). The Compare branch only checked ast.Name and ast.Attribute nodes, silently skipping ast.Subscript. A new branch was added to inspect the subscript slice and match it against RE_CANDIDATES.B105 did not detect credentials assigned to variables using camelCase or SCREAMING_CASE naming (e.g.,
DATABASEPASSWORD). RE_CANDIDATES required word boundaries via underscores or start/ end anchors, so names without underscores never matched. A plain substring alternative|{0}was added to RE_CANDIDATES so any name containing a credential word is caught regardless of casing convention. The re.IGNORECASE flag handles case differences.B106 did not flag keyword arguments using
api_keyor similar names becausekeywas absent from RE_WORDS. Addedkeyto the alternation group soapi_key,API_KEY, and similar identifiers are now recognised as credential-related argument names.Resolves: #1383