Skip to content

Tighten registry string heuristics#64

Open
Pppp1116 wants to merge 1 commit intomainfrom
codex/improve-taint-tracking-algorithm-njyeed
Open

Tighten registry string heuristics#64
Pppp1116 wants to merge 1 commit intomainfrom
codex/improve-taint-tracking-algorithm-njyeed

Conversation

@Pppp1116
Copy link
Copy Markdown
Owner

@Pppp1116 Pppp1116 commented Dec 11, 2025

User description

Summary

  • broaden registry hive aliases to include kernel-style prefixes and single-backslash variants
  • restrict registry string parsing to canonical prefixes and paths, inferring hives from Registry\Machine/User patterns
  • drop noise-only strings without registry-like paths to reduce synthetic root clutter

Testing

  • not run (not requested)

Codex Task


PR Type

Enhancement


Description

  • Broaden registry hive aliases with kernel-style and single-backslash variants

  • Add strict registry path validation regex to filter noise strings

  • Infer hives from Registry\Machine/User patterns when prefix missing

  • Require backslash in paths to eliminate synthetic root entries


Diagram Walkthrough

flowchart LR
  A["Registry String Input"] --> B["Match Prefix Pattern"]
  B --> C["Validate with REGISTRY_LIKELY_PATH_RE"]
  C --> D["Infer Hive from Registry Path"]
  D --> E["Extract Path Components"]
  E --> F["Return Parsed Registry Entry"]
  C -->|No Match| G["Return None"]
Loading

File Walkthrough

Relevant files
Enhancement
RegistryKeyBitfieldReport.py
Stricter registry string parsing with heuristic validation

RegistryKeyBitfieldReport.py

  • Expanded REGISTRY_HIVE_ALIASES to include single-backslash variants
    (Registry\Machine, Registry\User, Registry\Users)
  • Updated REGISTRY_STRING_PREFIX_RE regex to match optional leading
    backslash with \\\\? pattern
  • Added new REGISTRY_LIKELY_PATH_RE regex to validate registry paths
    against canonical prefixes and known subkeys
  • Enhanced parse_registry_string() to infer hives from Registry\
    patterns, strip kernel-style prefixes, and require backslashes in
    paths
  • Tightened validation logic to reject noise-only strings without
    registry-like structure
+32/-12 

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@qodo-code-review
Copy link
Copy Markdown

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No action logs: The new parsing and validation logic adds decision branches (e.g., early returns on
invalid paths) but does not log critical outcomes or reasons, making it hard to audit
parsing failures or hive inference results.

Referred Code
if hive_key is None:
    if lowered_path.startswith("registry\\"):
        hive_key = "HKLM" if "\\machine\\" in lowered_path else None
        hive_key = hive_key or ("HKU" if "\\user" in lowered_path or "\\users" in lowered_path else None)
    if not REGISTRY_LIKELY_PATH_RE.match(path):
        return None
elif lowered_path.startswith("registry\\"):
    path = re.sub(r"(?i)^registry\\(?:machine|user|users)\\?", "", path)
if "\\" not in path:
    return None

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent early returns: Multiple early returns on invalid inputs (e.g., when path lacks backslashes or regex
fails) provide no contextual error handling or logging, risking silent failures in
upstream workflows.

Referred Code
    if not REGISTRY_LIKELY_PATH_RE.match(path):
        return None
elif lowered_path.startswith("registry\\"):
    path = re.sub(r"(?i)^registry\\(?:machine|user|users)\\?", "", path)
if "\\" not in path:
    return None
parts = [p for p in path.split("\\") if p]
if not parts:
    return None

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
The new logic incorrectly rejects valid registry paths

The new check if "\" not in path: return None incorrectly rejects valid
registry paths that are directly under a hive root, such as "HKCU\MyValue". This
should be corrected to avoid missing legitimate registry accesses.

Examples:

RegistryKeyBitfieldReport.py [1042-1043]
    if "\\" not in path:
        return None

Solution Walkthrough:

Before:

def parse_registry_string(raw: str):
    # ... prefix and hive parsing ...
    path = ... # e.g., "MyValue" from "HKCU\MyValue"
    
    # ... other logic ...

    if "\\" not in path:
        return None # Rejects paths like "HKCU\MyValue"

    parts = path.split("\\")
    # ... logic to determine key_path and value_name ...
    return {"hive": hive_key, "path": path, ...}

After:

def parse_registry_string(raw: str):
    # ... prefix and hive parsing ...
    path = ... # e.g., "MyValue" from "HKCU\MyValue"
    
    # ... other logic ...

    # The check `if "\\" not in path: return None` is removed.

    parts = path.split("\\")
    if not parts:
        return None
    
    # ... logic to determine key_path and value_name ...
    # This now correctly handles paths with 0 or more backslashes.
    return {"hive": hive_key, "path": path, ...}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a significant flaw in the new validation logic that causes valid registry paths to be rejected, leading to potential false negatives in the analysis.

High
Possible issue
Refactor hive inference for correctness

Refactor the registry hive inference logic to use startswith for more accurate
detection, and combine hive inference with prefix stripping to fix a bug where
prefixes were not removed.

RegistryKeyBitfieldReport.py [1034-1041]

 if hive_key is None:
-    if lowered_path.startswith("registry\\"):
-        hive_key = "HKLM" if "\\machine\\" in lowered_path else None
-        hive_key = hive_key or ("HKU" if "\\user" in lowered_path or "\\users" in lowered_path else None)
+    if lowered_path.startswith("registry\\machine\\"):
+        hive_key = "HKLM"
+        path = re.sub(r"(?i)^registry\\machine\\?", "", path)
+    elif lowered_path.startswith("registry\\user\\") or lowered_path.startswith("registry\\users\\"):
+        hive_key = "HKU"
+        path = re.sub(r"(?i)^registry\\(?:user|users)\\?", "", path)
+
     if not REGISTRY_LIKELY_PATH_RE.match(path):
         return None
-elif lowered_path.startswith("registry\\"):
-    path = re.sub(r"(?i)^registry\\(?:machine|user|users)\\?", "", path)
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a bug where using in could lead to incorrect hive detection, and another bug where kernel-style prefixes are not stripped when a hive is inferred, resulting in an incorrect path.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant