feat: secrets filtering and sensitive data protection (Issue #12)#27
feat: secrets filtering and sensitive data protection (Issue #12)#27
Conversation
Comprehensive specification for secrets detection and sensitive data protection including pattern-based detection, entropy analysis, PII detection, and multiple filtering strategies (REDACT, MASK, BLOCK, WARN). Includes: - REQUIREMENTS.md: 14 P0, 6 P1, 5 P2 requirements - ARCHITECTURE.md: 7 components with integration design - IMPLEMENTATION_PLAN.md: 5 phases, 31 tasks - DECISIONS.md: 12 ADRs covering design choices Closes #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ADR-001 revised: Use Yelp detect-secrets instead of pure Python implementation. This provides 27+ battle-tested detectors and reduces implementation scope by ~70% for the detection layer. Changes: - ADR-001: Pure Python -> detect-secrets library - ADR-004: Superseded (entropy thresholds managed by detect-secrets) - Architecture: DetectSecretsAdapter replaces PatternDetector + EntropyAnalyzer - Implementation: 31 -> 30 tasks with lower complexity Still build ourselves: PIIDetector, Redactor, AllowlistManager, AuditLogger 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Spec SPEC-2025-12-25-001 approved for implementation. Approved by: Robert Allen <zircote@gmail.com> Timestamp: 2025-12-26T00:50:22Z Status: in-review -> approved Ready for implementation via /claude-spec:implement secrets-filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tracking 30 tasks across 5 phases: - Phase 1: Foundation (5 tasks) - Phase 2: Detection Layer (5 tasks) - Phase 3: Integration (8 tasks) - Phase 4: Commands & Audit (8 tasks) - Phase 5: Testing & Docs (4 tasks) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1 complete - foundation module structure: - security/__init__.py: Lazy loading pattern matching main package - security/models.py: SecretType (22 types), FilterStrategy, FilterAction enums - SecretDetection, FilterResult, AllowlistEntry, AuditEntry dataclasses - security/config.py: SecretsConfig with env vars and YAML file loading - Per-namespace strategy overrides - Default paths for allowlist and audit log - security/exceptions.py: SecretsFilteringError, BlockedContentError, etc. - security/service.py: Stub SecretsFilteringService (full impl in Phase 3) - Main __init__.py: Added get_secrets_filtering_service() factory All 5 Phase 1 tasks complete (1.1-1.5). Part of Issue #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… patterns Phase 2 of secrets filtering implementation: - Add detect-secrets>=1.4.0 dependency for battle-tested secret detection - Implement DetectSecretsAdapter wrapping Yelp detect-secrets library - Maps 27+ detector types to SecretType enum - Deduplicates overlapping detections (prefers specific over entropy) - SHA-256 hashing for allowlist matching - Implement PIIDetector for personally identifiable information - SSN pattern with validation (excludes 000/666/9XX prefixes) - Credit card patterns (Visa, MasterCard, Amex, Discover) with Luhn validation - Phone number patterns (US formats with optional +1) - Add comprehensive test suite (35 tests) - 11 tests for DetectSecretsAdapter - 24 tests for PIIDetector including Luhn algorithm Resolves detection layer for #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd redactor - Add AllowlistManager for hash-based known-safe values (never stores secrets) - Add AuditLogger for SOC2/GDPR compliant JSON Lines audit trail - Add Redactor with strategy-based filtering (REDACT/MASK/BLOCK/WARN) - Implement namespace-scoped allowlist support - Add log rotation with configurable file size limits - All modules use ServiceRegistry for thread-safe singleton management
… error handling - Migrate all security singletons to centralized ServiceRegistry pattern - Fix thread-safety bug in pii.py (previously had no lock) - Improve allowlist YAML corruption handling (raise instead of silent fail) - Escalate audit log write/rotation failures to ERROR level - Add optimized O(log n) line number lookup in PII detector - Remove redundant try-except patterns
- Add optional SecretsFilteringService integration to CaptureService - Filter both summary and content fields before storage - Rename MemoryError to MemoryPluginError (avoid shadowing builtin) - Add configurable secrets service injection - Replace suppress(Exception) with proper try-except logging - Add lock cleanup warning for debugging
- Add test_allowlist.py: hash value, persistence, expiration tests - Add test_audit.py: logging, rotation, query, thread safety tests - Add test_config.py: env vars, YAML loading, integration (98% coverage) - Add test_capture_integration.py: filter/block/allowlist integration - Add test_integration.py: end-to-end workflow tests - Add test_performance.py: timing benchmarks for detection pipeline - Add test_redactor.py: strategy and edge case tests - Add test_service.py: filter/scan/deduplication tests - Add test_secrets_commands.py: slash command tests - Add lock acquisition timeout and cleanup tests to test_capture.py - Update test_detector.py and test_pii.py for ServiceRegistry
- Add /memory:secrets-allowlist for managing known-safe values - Add /memory:scan-secrets for retrospective scanning - Add /memory:audit-log for viewing detection history - Add /memory:test-secret for testing detection patterns
- Add secrets filtering section to README with API examples - Document supported secret types and environment variables - Add SECRETS_FILTER_* env vars to CLAUDE.md - Include slash command reference table
- Add CODE_REVIEW.md with multi-agent analysis findings - Add REVIEW_SUMMARY.md with executive summary - Add REMEDIATION_TASKS.md with prioritized fix list - Add REMEDIATION_REPORT.md documenting all applied fixes
There was a problem hiding this comment.
Pull request overview
This PR implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, PII) from being captured in memories. The implementation follows a detailed specification and includes:
- Multi-layer detection: detect-secrets library (27+ patterns) + custom PII detection (SSN, credit cards, phones)
- Flexible filtering strategies: REDACT, MASK, BLOCK, WARN with per-namespace configuration
- Hash-based allowlist: Known-safe values can bypass filtering
- Full audit logging: SOC2/GDPR compliance with JSON Lines format
- Retrospective scanning: Scan existing memories for secrets
Reviewed changes
Copilot reviewed 46 out of 47 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| src/git_notes_memory/security/*.py | Complete security module with 9 components |
| tests/security/*.py | Comprehensive test suite (>90% coverage target) |
| src/git_notes_memory/capture.py | Integration with capture pipeline for filtering |
| src/git_notes_memory/index.py | Added get_all_memories() for retrospective scanning |
| src/git_notes_memory/exceptions.py | Renamed MemoryError → MemoryPluginError |
| pyproject.toml, uv.lock | Added detect-secrets>=1.4.0 dependency |
| docs/spec/active/2025-12-25-secrets-filtering/* | Full specification docs |
|
|
||
| config = SecretsConfig(enabled=True, entropy_enabled=False) | ||
| secrets_service = SecretsFilteringService(config=config, data_dir=data_dir) | ||
| audit_logger = get_audit_logger(log_dir=data_dir / "audit") |
There was a problem hiding this comment.
Variable audit_logger is not used.
| "get_capture_service", | ||
| "get_recall_service", | ||
| "get_sync_service", | ||
| "get_secrets_filtering_service", |
There was a problem hiding this comment.
The name 'get_secrets_filtering_service' is exported by all but is not defined.
|
|
||
| __all__ = [ | ||
| # Factory functions (lazy-loaded) | ||
| "get_secrets_filtering_service", |
There was a problem hiding this comment.
The name 'get_secrets_filtering_service' is exported by all but is not defined.
| from __future__ import annotations | ||
|
|
||
| __all__ = [ | ||
| # Factory functions (lazy-loaded) | ||
| "get_secrets_filtering_service", | ||
| "get_redactor", | ||
| "get_allowlist_manager", | ||
| "get_audit_logger", | ||
| # Enums | ||
| "SecretType", | ||
| "FilterStrategy", | ||
| "FilterAction", | ||
| # Models | ||
| "SecretDetection", | ||
| "FilterResult", | ||
| "AllowlistEntry", | ||
| "AuditEntry", | ||
| # Exceptions | ||
| "SecretsFilteringError", | ||
| "BlockedContentError", | ||
| "AllowlistError", | ||
| "AuditLogError", | ||
| ] | ||
|
|
||
|
|
||
| def __getattr__(name: str) -> object: |
There was a problem hiding this comment.
The name 'get_redactor' is exported by all but is not defined.
| from __future__ import annotations | |
| __all__ = [ | |
| # Factory functions (lazy-loaded) | |
| "get_secrets_filtering_service", | |
| "get_redactor", | |
| "get_allowlist_manager", | |
| "get_audit_logger", | |
| # Enums | |
| "SecretType", | |
| "FilterStrategy", | |
| "FilterAction", | |
| # Models | |
| "SecretDetection", | |
| "FilterResult", | |
| "AllowlistEntry", | |
| "AuditEntry", | |
| # Exceptions | |
| "SecretsFilteringError", | |
| "BlockedContentError", | |
| "AllowlistError", | |
| "AuditLogError", | |
| ] | |
| def __getattr__(name: str) -> object: | |
| from __future__ import annotations | |
| from typing import Any | |
| __all__ = [ | |
| # Factory functions (lazy-loaded) | |
| "get_secrets_filtering_service", | |
| "get_redactor", | |
| "get_allowlist_manager", | |
| "get_audit_logger", | |
| # Enums | |
| "SecretType", | |
| "FilterStrategy", | |
| "FilterAction", | |
| # Models | |
| "SecretDetection", | |
| "FilterResult", | |
| "AllowlistEntry", | |
| "AuditEntry", | |
| # Exceptions | |
| "SecretsFilteringError", | |
| "BlockedContentError", | |
| "AllowlistError", | |
| "AuditLogError", | |
| ] | |
| def get_redactor(*args: Any, **kwargs: Any) -> Any: | |
| """Return the default redactor instance. | |
| This wrapper exposes ``get_redactor`` as a real module attribute while | |
| delegating to the lazily-loaded implementation resolved via | |
| ``__getattr__`` to avoid importing heavy dependencies at import time. | |
| """ | |
| redactor_factory = __getattr__("get_redactor") | |
| return redactor_factory(*args, **kwargs) | |
| def __getattr__(name: str) -> object: |
| # Factory functions (lazy-loaded) | ||
| "get_secrets_filtering_service", | ||
| "get_redactor", | ||
| "get_allowlist_manager", |
There was a problem hiding this comment.
The name 'get_allowlist_manager' is exported by all but is not defined.
| import sys | ||
| from git_notes_memory.security import get_secrets_filtering_service | ||
| from git_notes_memory.security.models import FilterAction | ||
|
|
||
| # Get the test value - passed as environment or inline | ||
| test_value = '''${TEST_VALUE}''' |
There was a problem hiding this comment.
User input from the /memory:test-secret command is interpolated directly into a Python -c program via test_value = '''${TEST_VALUE}''', so a crafted value containing ''' and extra Python code can break out of the string literal and execute arbitrary code in the plugin process. An attacker could convince a developer to run /memory:test-secret with a malicious argument (e.g., copied from untrusted documentation), leading to arbitrary Python execution with the plugin's local file and environment access. To fix this, avoid embedding raw arguments into the Python source and instead pass them via sys.argv or environment variables and parse them inside Python, or at minimum serialize/escape the value safely (e.g., using JSON) before interpolation so it cannot alter the program structure.
| import sys | |
| from git_notes_memory.security import get_secrets_filtering_service | |
| from git_notes_memory.security.models import FilterAction | |
| # Get the test value - passed as environment or inline | |
| test_value = '''${TEST_VALUE}''' | |
| import os | |
| import sys | |
| from git_notes_memory.security import get_secrets_filtering_service | |
| from git_notes_memory.security.models import FilterAction | |
| # Get the test value from the TEST_VALUE environment variable | |
| test_value = os.environ.get('TEST_VALUE', '') |
| uv run --directory "$PLUGIN_ROOT" python3 -c " | ||
| import sys | ||
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | ||
|
|
||
| hash_val = '${HASH}' | ||
| namespace = '${NAMESPACE}' or None | ||
| reason = '${REASON}' |
There was a problem hiding this comment.
The /memory:secrets-allowlist add subcommand embeds user-supplied --hash, --namespace, and --reason values directly into a Python -c string as hash_val = '${HASH}', namespace = '${NAMESPACE}' or None, and reason = '${REASON}', allowing injected quotes and Python code to break out of the string literal and execute. A malicious value such as a crafted --reason could cause arbitrary Python execution when a developer runs the command, compromising local files or credentials. Refactor this to pass arguments into Python via sys.argv or environment variables and treat them as data (or safely serialize/escape them) rather than concatenating them into executable source code.
| uv run --directory "$PLUGIN_ROOT" python3 -c " | |
| import sys | |
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | |
| hash_val = '${HASH}' | |
| namespace = '${NAMESPACE}' or None | |
| reason = '${REASON}' | |
| HASH="$HASH" NAMESPACE="$NAMESPACE" REASON="$REASON" uv run --directory "$PLUGIN_ROOT" python3 -c " | |
| import os | |
| import sys | |
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | |
| hash_val = os.environ.get('HASH') or '' | |
| namespace = os.environ.get('NAMESPACE') or None | |
| reason = os.environ.get('REASON') or '' |
| import sys | ||
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | ||
|
|
||
| hash_val = '${HASH}' |
There was a problem hiding this comment.
The /memory:secrets-allowlist remove subcommand similarly interpolates the user-controlled --hash value into the Python -c source as hash_val = '${HASH}', so a hash argument containing quotes and code can escape the string literal and run arbitrary Python. This means a crafted command line (for example copied from untrusted documentation) can trigger code execution in the plugin environment. Use safe argument passing (e.g., sys.argv or environment variables) or robust serialization/escaping instead of embedding raw CLI values into Python source.
| import sys | |
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | |
| hash_val = '${HASH}' | |
| import os | |
| import sys | |
| from git_notes_memory.security import get_allowlist_manager, get_audit_logger | |
| hash_val = os.environ.get('HASH', '') |
| from git_notes_memory import get_recall_service | ||
| from git_notes_memory.security import get_secrets_filtering_service | ||
| from git_notes_memory.security.models import FilterAction | ||
|
|
||
| namespace_filter = '${NAMESPACE}' or None |
There was a problem hiding this comment.
In the first /memory:scan-secrets scan step, the --namespace argument is injected into the Python -c command as namespace_filter = '${NAMESPACE}' or None, which allows a value containing quotes and additional Python code to escape the string and execute in the scanning process. This exposes arbitrary code execution when a developer runs the scan command with a crafted namespace argument. Instead, pass the namespace as data (e.g., via sys.argv or environment) and avoid direct interpolation into the Python program text or ensure it is safely serialized/escaped before use.
| from git_notes_memory import get_recall_service | |
| from git_notes_memory.security import get_secrets_filtering_service | |
| from git_notes_memory.security.models import FilterAction | |
| namespace_filter = '${NAMESPACE}' or None | |
| import os | |
| from git_notes_memory import get_recall_service | |
| from git_notes_memory.security import get_secrets_filtering_service | |
| from git_notes_memory.security.models import FilterAction | |
| namespace_filter = os.environ.get('NAMESPACE') or None |
| from git_notes_memory import get_recall_service, get_capture_service | ||
| from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger | ||
| from git_notes_memory.index import IndexService | ||
| from git_notes_memory.config import get_project_index_path | ||
|
|
||
| namespace_filter = '${NAMESPACE}' or None | ||
| dry_run = '${DRY_RUN}' == 'true' |
There was a problem hiding this comment.
In the remediation step of /memory:scan-secrets, user-controlled --namespace and --dry-run flags are embedded directly into the Python -c code (namespace_filter = '${NAMESPACE}' or None and dry_run = '${DRY_RUN}' == 'true'), so crafted values containing quotes and Python syntax can break out of the string literal and execute. This again provides an arbitrary code execution path whenever a developer invokes the command with specially formed arguments. Mitigate this by treating these flags purely as data passed via sys.argv/environment and parsed in Python, or by serializing/escaping them safely before interpolation so they cannot alter the code structure.
| from git_notes_memory import get_recall_service, get_capture_service | |
| from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger | |
| from git_notes_memory.index import IndexService | |
| from git_notes_memory.config import get_project_index_path | |
| namespace_filter = '${NAMESPACE}' or None | |
| dry_run = '${DRY_RUN}' == 'true' | |
| import os | |
| from git_notes_memory import get_recall_service, get_capture_service | |
| from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger | |
| from git_notes_memory.index import IndexService | |
| from git_notes_memory.config import get_project_index_path | |
| namespace_filter = os.environ.get('NAMESPACE') or None | |
| dry_run = os.environ.get('DRY_RUN', 'false').lower() == 'true' |
Summary
Implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, private keys, PII) from being captured in memories.
GitHub Issue: #12
Approach
Components
Implementation Plan
Specification
See
docs/spec/active/2025-12-25-secrets-filtering/for full spec:Test Plan
🤖 Generated with Claude Code