Skip to content
This repository was archived by the owner on Jan 2, 2026. It is now read-only.

feat: secrets filtering and sensitive data protection (Issue #12)#27

Merged
zircote merged 13 commits intomainfrom
issue-12-secrets-filtering
Dec 26, 2025
Merged

feat: secrets filtering and sensitive data protection (Issue #12)#27
zircote merged 13 commits intomainfrom
issue-12-secrets-filtering

Conversation

@zircote
Copy link
Owner

@zircote zircote commented Dec 26, 2025

Summary

Implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, private keys, PII) from being captured in memories.

GitHub Issue: #12

Approach

  • Uses detect-secrets (Yelp) library for battle-tested secret detection (27+ patterns)
  • Custom PII detection for SSN, credit cards, phone numbers
  • Default REDACT strategy (configurable: MASK, BLOCK, WARN)
  • Hash-based allowlist for known-safe values
  • Full audit logging for SOC2/GDPR compliance

Components

  1. SecretsFilteringService - Main orchestrator
  2. DetectSecretsAdapter - Wraps detect-secrets library
  3. PIIDetector - Custom PII patterns
  4. AllowlistManager - Hash-based known-safe values
  5. Redactor - Applies filtering strategies
  6. AuditLogger - Compliance logging

Implementation Plan

  • Phase 1: Foundation (module structure, models, config)
  • Phase 2: Detection Layer (detect-secrets, PII)
  • Phase 3: Integration (redactor, allowlist, capture pipeline)
  • Phase 4: Commands & Audit (CLI, logging)
  • Phase 5: Testing & Documentation

Specification

See docs/spec/active/2025-12-25-secrets-filtering/ for full spec:

  • REQUIREMENTS.md (14 P0, 6 P1, 5 P2 requirements)
  • ARCHITECTURE.md (6 components)
  • IMPLEMENTATION_PLAN.md (30 tasks, 5 phases)
  • DECISIONS.md (12 ADRs)

Test Plan

  • Unit tests for all components (>90% coverage)
  • Integration tests for full pipeline
  • Performance tests (<10ms latency target)
  • Retrospective scanning tests

🤖 Generated with Claude Code

zircote and others added 13 commits December 25, 2025 19:39
Comprehensive specification for secrets detection and sensitive data
protection including pattern-based detection, entropy analysis, PII
detection, and multiple filtering strategies (REDACT, MASK, BLOCK, WARN).

Includes:
- REQUIREMENTS.md: 14 P0, 6 P1, 5 P2 requirements
- ARCHITECTURE.md: 7 components with integration design
- IMPLEMENTATION_PLAN.md: 5 phases, 31 tasks
- DECISIONS.md: 12 ADRs covering design choices

Closes #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ADR-001 revised: Use Yelp detect-secrets instead of pure Python
implementation. This provides 27+ battle-tested detectors and reduces
implementation scope by ~70% for the detection layer.

Changes:
- ADR-001: Pure Python -> detect-secrets library
- ADR-004: Superseded (entropy thresholds managed by detect-secrets)
- Architecture: DetectSecretsAdapter replaces PatternDetector + EntropyAnalyzer
- Implementation: 31 -> 30 tasks with lower complexity

Still build ourselves: PIIDetector, Redactor, AllowlistManager, AuditLogger

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Spec SPEC-2025-12-25-001 approved for implementation.

Approved by: Robert Allen <zircote@gmail.com>
Timestamp: 2025-12-26T00:50:22Z
Status: in-review -> approved

Ready for implementation via /claude-spec:implement secrets-filtering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tracking 30 tasks across 5 phases:
- Phase 1: Foundation (5 tasks)
- Phase 2: Detection Layer (5 tasks)
- Phase 3: Integration (8 tasks)
- Phase 4: Commands & Audit (8 tasks)
- Phase 5: Testing & Docs (4 tasks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1 complete - foundation module structure:

- security/__init__.py: Lazy loading pattern matching main package
- security/models.py: SecretType (22 types), FilterStrategy, FilterAction enums
  - SecretDetection, FilterResult, AllowlistEntry, AuditEntry dataclasses
- security/config.py: SecretsConfig with env vars and YAML file loading
  - Per-namespace strategy overrides
  - Default paths for allowlist and audit log
- security/exceptions.py: SecretsFilteringError, BlockedContentError, etc.
- security/service.py: Stub SecretsFilteringService (full impl in Phase 3)
- Main __init__.py: Added get_secrets_filtering_service() factory

All 5 Phase 1 tasks complete (1.1-1.5).

Part of Issue #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… patterns

Phase 2 of secrets filtering implementation:

- Add detect-secrets>=1.4.0 dependency for battle-tested secret detection
- Implement DetectSecretsAdapter wrapping Yelp detect-secrets library
  - Maps 27+ detector types to SecretType enum
  - Deduplicates overlapping detections (prefers specific over entropy)
  - SHA-256 hashing for allowlist matching
- Implement PIIDetector for personally identifiable information
  - SSN pattern with validation (excludes 000/666/9XX prefixes)
  - Credit card patterns (Visa, MasterCard, Amex, Discover) with Luhn validation
  - Phone number patterns (US formats with optional +1)
- Add comprehensive test suite (35 tests)
  - 11 tests for DetectSecretsAdapter
  - 24 tests for PIIDetector including Luhn algorithm

Resolves detection layer for #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd redactor

- Add AllowlistManager for hash-based known-safe values (never stores secrets)
- Add AuditLogger for SOC2/GDPR compliant JSON Lines audit trail
- Add Redactor with strategy-based filtering (REDACT/MASK/BLOCK/WARN)
- Implement namespace-scoped allowlist support
- Add log rotation with configurable file size limits
- All modules use ServiceRegistry for thread-safe singleton management
… error handling

- Migrate all security singletons to centralized ServiceRegistry pattern
- Fix thread-safety bug in pii.py (previously had no lock)
- Improve allowlist YAML corruption handling (raise instead of silent fail)
- Escalate audit log write/rotation failures to ERROR level
- Add optimized O(log n) line number lookup in PII detector
- Remove redundant try-except patterns
- Add optional SecretsFilteringService integration to CaptureService
- Filter both summary and content fields before storage
- Rename MemoryError to MemoryPluginError (avoid shadowing builtin)
- Add configurable secrets service injection
- Replace suppress(Exception) with proper try-except logging
- Add lock cleanup warning for debugging
- Add test_allowlist.py: hash value, persistence, expiration tests
- Add test_audit.py: logging, rotation, query, thread safety tests
- Add test_config.py: env vars, YAML loading, integration (98% coverage)
- Add test_capture_integration.py: filter/block/allowlist integration
- Add test_integration.py: end-to-end workflow tests
- Add test_performance.py: timing benchmarks for detection pipeline
- Add test_redactor.py: strategy and edge case tests
- Add test_service.py: filter/scan/deduplication tests
- Add test_secrets_commands.py: slash command tests
- Add lock acquisition timeout and cleanup tests to test_capture.py
- Update test_detector.py and test_pii.py for ServiceRegistry
- Add /memory:secrets-allowlist for managing known-safe values
- Add /memory:scan-secrets for retrospective scanning
- Add /memory:audit-log for viewing detection history
- Add /memory:test-secret for testing detection patterns
- Add secrets filtering section to README with API examples
- Document supported secret types and environment variables
- Add SECRETS_FILTER_* env vars to CLAUDE.md
- Include slash command reference table
- Add CODE_REVIEW.md with multi-agent analysis findings
- Add REVIEW_SUMMARY.md with executive summary
- Add REMEDIATION_TASKS.md with prioritized fix list
- Add REMEDIATION_REPORT.md documenting all applied fixes
@zircote zircote marked this pull request as ready for review December 26, 2025 03:17
Copilot AI review requested due to automatic review settings December 26, 2025 03:17
@zircote zircote merged commit c9f358e into main Dec 26, 2025
11 checks passed
@zircote zircote deleted the issue-12-secrets-filtering branch December 26, 2025 03:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, PII) from being captured in memories. The implementation follows a detailed specification and includes:

  • Multi-layer detection: detect-secrets library (27+ patterns) + custom PII detection (SSN, credit cards, phones)
  • Flexible filtering strategies: REDACT, MASK, BLOCK, WARN with per-namespace configuration
  • Hash-based allowlist: Known-safe values can bypass filtering
  • Full audit logging: SOC2/GDPR compliance with JSON Lines format
  • Retrospective scanning: Scan existing memories for secrets

Reviewed changes

Copilot reviewed 46 out of 47 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
src/git_notes_memory/security/*.py Complete security module with 9 components
tests/security/*.py Comprehensive test suite (>90% coverage target)
src/git_notes_memory/capture.py Integration with capture pipeline for filtering
src/git_notes_memory/index.py Added get_all_memories() for retrospective scanning
src/git_notes_memory/exceptions.py Renamed MemoryError → MemoryPluginError
pyproject.toml, uv.lock Added detect-secrets>=1.4.0 dependency
docs/spec/active/2025-12-25-secrets-filtering/* Full specification docs


config = SecretsConfig(enabled=True, entropy_enabled=False)
secrets_service = SecretsFilteringService(config=config, data_dir=data_dir)
audit_logger = get_audit_logger(log_dir=data_dir / "audit")
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable audit_logger is not used.

Copilot uses AI. Check for mistakes.
"get_capture_service",
"get_recall_service",
"get_sync_service",
"get_secrets_filtering_service",
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'get_secrets_filtering_service' is exported by all but is not defined.

Copilot uses AI. Check for mistakes.

__all__ = [
# Factory functions (lazy-loaded)
"get_secrets_filtering_service",
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'get_secrets_filtering_service' is exported by all but is not defined.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +46
from __future__ import annotations

__all__ = [
# Factory functions (lazy-loaded)
"get_secrets_filtering_service",
"get_redactor",
"get_allowlist_manager",
"get_audit_logger",
# Enums
"SecretType",
"FilterStrategy",
"FilterAction",
# Models
"SecretDetection",
"FilterResult",
"AllowlistEntry",
"AuditEntry",
# Exceptions
"SecretsFilteringError",
"BlockedContentError",
"AllowlistError",
"AuditLogError",
]


def __getattr__(name: str) -> object:
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'get_redactor' is exported by all but is not defined.

Suggested change
from __future__ import annotations
__all__ = [
# Factory functions (lazy-loaded)
"get_secrets_filtering_service",
"get_redactor",
"get_allowlist_manager",
"get_audit_logger",
# Enums
"SecretType",
"FilterStrategy",
"FilterAction",
# Models
"SecretDetection",
"FilterResult",
"AllowlistEntry",
"AuditEntry",
# Exceptions
"SecretsFilteringError",
"BlockedContentError",
"AllowlistError",
"AuditLogError",
]
def __getattr__(name: str) -> object:
from __future__ import annotations
from typing import Any
__all__ = [
# Factory functions (lazy-loaded)
"get_secrets_filtering_service",
"get_redactor",
"get_allowlist_manager",
"get_audit_logger",
# Enums
"SecretType",
"FilterStrategy",
"FilterAction",
# Models
"SecretDetection",
"FilterResult",
"AllowlistEntry",
"AuditEntry",
# Exceptions
"SecretsFilteringError",
"BlockedContentError",
"AllowlistError",
"AuditLogError",
]
def get_redactor(*args: Any, **kwargs: Any) -> Any:
"""Return the default redactor instance.
This wrapper exposes ``get_redactor`` as a real module attribute while
delegating to the lazily-loaded implementation resolved via
``__getattr__`` to avoid importing heavy dependencies at import time.
"""
redactor_factory = __getattr__("get_redactor")
return redactor_factory(*args, **kwargs)
def __getattr__(name: str) -> object:

Copilot uses AI. Check for mistakes.
# Factory functions (lazy-loaded)
"get_secrets_filtering_service",
"get_redactor",
"get_allowlist_manager",
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'get_allowlist_manager' is exported by all but is not defined.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +96
import sys
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction

# Get the test value - passed as environment or inline
test_value = '''${TEST_VALUE}'''
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User input from the /memory:test-secret command is interpolated directly into a Python -c program via test_value = '''${TEST_VALUE}''', so a crafted value containing ''' and extra Python code can break out of the string literal and execute arbitrary code in the plugin process. An attacker could convince a developer to run /memory:test-secret with a malicious argument (e.g., copied from untrusted documentation), leading to arbitrary Python execution with the plugin's local file and environment access. To fix this, avoid embedding raw arguments into the Python source and instead pass them via sys.argv or environment variables and parse them inside Python, or at minimum serialize/escape the value safely (e.g., using JSON) before interpolation so it cannot alter the program structure.

Suggested change
import sys
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction
# Get the test value - passed as environment or inline
test_value = '''${TEST_VALUE}'''
import os
import sys
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction
# Get the test value from the TEST_VALUE environment variable
test_value = os.environ.get('TEST_VALUE', '')

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +136
uv run --directory "$PLUGIN_ROOT" python3 -c "
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = '${HASH}'
namespace = '${NAMESPACE}' or None
reason = '${REASON}'
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /memory:secrets-allowlist add subcommand embeds user-supplied --hash, --namespace, and --reason values directly into a Python -c string as hash_val = '${HASH}', namespace = '${NAMESPACE}' or None, and reason = '${REASON}', allowing injected quotes and Python code to break out of the string literal and execute. A malicious value such as a crafted --reason could cause arbitrary Python execution when a developer runs the command, compromising local files or credentials. Refactor this to pass arguments into Python via sys.argv or environment variables and treat them as data (or safely serialize/escape them) rather than concatenating them into executable source code.

Suggested change
uv run --directory "$PLUGIN_ROOT" python3 -c "
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger
hash_val = '${HASH}'
namespace = '${NAMESPACE}' or None
reason = '${REASON}'
HASH="$HASH" NAMESPACE="$NAMESPACE" REASON="$REASON" uv run --directory "$PLUGIN_ROOT" python3 -c "
import os
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger
hash_val = os.environ.get('HASH') or ''
namespace = os.environ.get('NAMESPACE') or None
reason = os.environ.get('REASON') or ''

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +194
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = '${HASH}'
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /memory:secrets-allowlist remove subcommand similarly interpolates the user-controlled --hash value into the Python -c source as hash_val = '${HASH}', so a hash argument containing quotes and code can escape the string literal and run arbitrary Python. This means a crafted command line (for example copied from untrusted documentation) can trigger code execution in the plugin environment. Use safe argument passing (e.g., sys.argv or environment variables) or robust serialization/escaping instead of embedding raw CLI values into Python source.

Suggested change
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger
hash_val = '${HASH}'
import os
import sys
from git_notes_memory.security import get_allowlist_manager, get_audit_logger
hash_val = os.environ.get('HASH', '')

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +94
from git_notes_memory import get_recall_service
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction

namespace_filter = '${NAMESPACE}' or None
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first /memory:scan-secrets scan step, the --namespace argument is injected into the Python -c command as namespace_filter = '${NAMESPACE}' or None, which allows a value containing quotes and additional Python code to escape the string and execute in the scanning process. This exposes arbitrary code execution when a developer runs the scan command with a crafted namespace argument. Instead, pass the namespace as data (e.g., via sys.argv or environment) and avoid direct interpolation into the Python program text or ensure it is safely serialized/escaped before use.

Suggested change
from git_notes_memory import get_recall_service
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction
namespace_filter = '${NAMESPACE}' or None
import os
from git_notes_memory import get_recall_service
from git_notes_memory.security import get_secrets_filtering_service
from git_notes_memory.security.models import FilterAction
namespace_filter = os.environ.get('NAMESPACE') or None

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +172
from git_notes_memory import get_recall_service, get_capture_service
from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger
from git_notes_memory.index import IndexService
from git_notes_memory.config import get_project_index_path

namespace_filter = '${NAMESPACE}' or None
dry_run = '${DRY_RUN}' == 'true'
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the remediation step of /memory:scan-secrets, user-controlled --namespace and --dry-run flags are embedded directly into the Python -c code (namespace_filter = '${NAMESPACE}' or None and dry_run = '${DRY_RUN}' == 'true'), so crafted values containing quotes and Python syntax can break out of the string literal and execute. This again provides an arbitrary code execution path whenever a developer invokes the command with specially formed arguments. Mitigate this by treating these flags purely as data passed via sys.argv/environment and parsed in Python, or by serializing/escaping them safely before interpolation so they cannot alter the code structure.

Suggested change
from git_notes_memory import get_recall_service, get_capture_service
from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger
from git_notes_memory.index import IndexService
from git_notes_memory.config import get_project_index_path
namespace_filter = '${NAMESPACE}' or None
dry_run = '${DRY_RUN}' == 'true'
import os
from git_notes_memory import get_recall_service, get_capture_service
from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger
from git_notes_memory.index import IndexService
from git_notes_memory.config import get_project_index_path
namespace_filter = os.environ.get('NAMESPACE') or None
dry_run = os.environ.get('DRY_RUN', 'false').lower() == 'true'

Copilot uses AI. Check for mistakes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants