feat: secrets filtering and sensitive data protection (Issue #12) by zircote · Pull Request #27 · zircote/git-notes-memory

zircote · 2025-12-26T00:58:43Z

Summary

Implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, private keys, PII) from being captured in memories.

GitHub Issue: #12

Approach

Uses detect-secrets (Yelp) library for battle-tested secret detection (27+ patterns)
Custom PII detection for SSN, credit cards, phone numbers
Default REDACT strategy (configurable: MASK, BLOCK, WARN)
Hash-based allowlist for known-safe values
Full audit logging for SOC2/GDPR compliance

Components

SecretsFilteringService - Main orchestrator
DetectSecretsAdapter - Wraps detect-secrets library
PIIDetector - Custom PII patterns
AllowlistManager - Hash-based known-safe values
Redactor - Applies filtering strategies
AuditLogger - Compliance logging

Implementation Plan

Phase 1: Foundation (module structure, models, config)
Phase 2: Detection Layer (detect-secrets, PII)
Phase 3: Integration (redactor, allowlist, capture pipeline)
Phase 4: Commands & Audit (CLI, logging)
Phase 5: Testing & Documentation

Specification

See docs/spec/active/2025-12-25-secrets-filtering/ for full spec:

REQUIREMENTS.md (14 P0, 6 P1, 5 P2 requirements)
ARCHITECTURE.md (6 components)
IMPLEMENTATION_PLAN.md (30 tasks, 5 phases)
DECISIONS.md (12 ADRs)

Test Plan

Unit tests for all components (>90% coverage)
Integration tests for full pipeline
Performance tests (<10ms latency target)
Retrospective scanning tests

🤖 Generated with Claude Code

Comprehensive specification for secrets detection and sensitive data protection including pattern-based detection, entropy analysis, PII detection, and multiple filtering strategies (REDACT, MASK, BLOCK, WARN). Includes: - REQUIREMENTS.md: 14 P0, 6 P1, 5 P2 requirements - ARCHITECTURE.md: 7 components with integration design - IMPLEMENTATION_PLAN.md: 5 phases, 31 tasks - DECISIONS.md: 12 ADRs covering design choices Closes #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ADR-001 revised: Use Yelp detect-secrets instead of pure Python implementation. This provides 27+ battle-tested detectors and reduces implementation scope by ~70% for the detection layer. Changes: - ADR-001: Pure Python -> detect-secrets library - ADR-004: Superseded (entropy thresholds managed by detect-secrets) - Architecture: DetectSecretsAdapter replaces PatternDetector + EntropyAnalyzer - Implementation: 31 -> 30 tasks with lower complexity Still build ourselves: PIIDetector, Redactor, AllowlistManager, AuditLogger 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Spec SPEC-2025-12-25-001 approved for implementation. Approved by: Robert Allen <zircote@gmail.com> Timestamp: 2025-12-26T00:50:22Z Status: in-review -> approved Ready for implementation via /claude-spec:implement secrets-filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tracking 30 tasks across 5 phases: - Phase 1: Foundation (5 tasks) - Phase 2: Detection Layer (5 tasks) - Phase 3: Integration (8 tasks) - Phase 4: Commands & Audit (8 tasks) - Phase 5: Testing & Docs (4 tasks) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Phase 1 complete - foundation module structure: - security/__init__.py: Lazy loading pattern matching main package - security/models.py: SecretType (22 types), FilterStrategy, FilterAction enums - SecretDetection, FilterResult, AllowlistEntry, AuditEntry dataclasses - security/config.py: SecretsConfig with env vars and YAML file loading - Per-namespace strategy overrides - Default paths for allowlist and audit log - security/exceptions.py: SecretsFilteringError, BlockedContentError, etc. - security/service.py: Stub SecretsFilteringService (full impl in Phase 3) - Main __init__.py: Added get_secrets_filtering_service() factory All 5 Phase 1 tasks complete (1.1-1.5). Part of Issue #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… patterns Phase 2 of secrets filtering implementation: - Add detect-secrets>=1.4.0 dependency for battle-tested secret detection - Implement DetectSecretsAdapter wrapping Yelp detect-secrets library - Maps 27+ detector types to SecretType enum - Deduplicates overlapping detections (prefers specific over entropy) - SHA-256 hashing for allowlist matching - Implement PIIDetector for personally identifiable information - SSN pattern with validation (excludes 000/666/9XX prefixes) - Credit card patterns (Visa, MasterCard, Amex, Discover) with Luhn validation - Phone number patterns (US formats with optional +1) - Add comprehensive test suite (35 tests) - 11 tests for DetectSecretsAdapter - 24 tests for PIIDetector including Luhn algorithm Resolves detection layer for #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…nd redactor - Add AllowlistManager for hash-based known-safe values (never stores secrets) - Add AuditLogger for SOC2/GDPR compliant JSON Lines audit trail - Add Redactor with strategy-based filtering (REDACT/MASK/BLOCK/WARN) - Implement namespace-scoped allowlist support - Add log rotation with configurable file size limits - All modules use ServiceRegistry for thread-safe singleton management

… error handling - Migrate all security singletons to centralized ServiceRegistry pattern - Fix thread-safety bug in pii.py (previously had no lock) - Improve allowlist YAML corruption handling (raise instead of silent fail) - Escalate audit log write/rotation failures to ERROR level - Add optimized O(log n) line number lookup in PII detector - Remove redundant try-except patterns

- Add optional SecretsFilteringService integration to CaptureService - Filter both summary and content fields before storage - Rename MemoryError to MemoryPluginError (avoid shadowing builtin) - Add configurable secrets service injection - Replace suppress(Exception) with proper try-except logging - Add lock cleanup warning for debugging

- Add test_allowlist.py: hash value, persistence, expiration tests - Add test_audit.py: logging, rotation, query, thread safety tests - Add test_config.py: env vars, YAML loading, integration (98% coverage) - Add test_capture_integration.py: filter/block/allowlist integration - Add test_integration.py: end-to-end workflow tests - Add test_performance.py: timing benchmarks for detection pipeline - Add test_redactor.py: strategy and edge case tests - Add test_service.py: filter/scan/deduplication tests - Add test_secrets_commands.py: slash command tests - Add lock acquisition timeout and cleanup tests to test_capture.py - Update test_detector.py and test_pii.py for ServiceRegistry

- Add /memory:secrets-allowlist for managing known-safe values - Add /memory:scan-secrets for retrospective scanning - Add /memory:audit-log for viewing detection history - Add /memory:test-secret for testing detection patterns

- Add secrets filtering section to README with API examples - Document supported secret types and environment variables - Add SECRETS_FILTER_* env vars to CLAUDE.md - Include slash command reference table

- Add CODE_REVIEW.md with multi-agent analysis findings - Add REVIEW_SUMMARY.md with executive summary - Add REMEDIATION_TASKS.md with prioritized fix list - Add REMEDIATION_REPORT.md documenting all applied fixes

Copilot

Pull request overview

This PR implements comprehensive secrets detection and filtering to prevent sensitive data (API keys, passwords, tokens, PII) from being captured in memories. The implementation follows a detailed specification and includes:

Multi-layer detection: detect-secrets library (27+ patterns) + custom PII detection (SSN, credit cards, phones)
Flexible filtering strategies: REDACT, MASK, BLOCK, WARN with per-namespace configuration
Hash-based allowlist: Known-safe values can bypass filtering
Full audit logging: SOC2/GDPR compliance with JSON Lines format
Retrospective scanning: Scan existing memories for secrets

Reviewed changes

Copilot reviewed 46 out of 47 changed files in this pull request and generated 24 comments.

Show a summary per file

File	Description
src/git_notes_memory/security/*.py	Complete security module with 9 components
tests/security/*.py	Comprehensive test suite (>90% coverage target)
src/git_notes_memory/capture.py	Integration with capture pipeline for filtering
src/git_notes_memory/index.py	Added get_all_memories() for retrospective scanning
src/git_notes_memory/exceptions.py	Renamed MemoryError → MemoryPluginError
pyproject.toml, uv.lock	Added detect-secrets>=1.4.0 dependency
docs/spec/active/2025-12-25-secrets-filtering/*	Full specification docs

Copilot · 2025-12-26T03:23:02Z

tests/security/test_integration.py

+
+        config = SecretsConfig(enabled=True, entropy_enabled=False)
+        secrets_service = SecretsFilteringService(config=config, data_dir=data_dir)
+        audit_logger = get_audit_logger(log_dir=data_dir / "audit")


Variable audit_logger is not used.

Copilot · 2025-12-26T03:23:02Z

src/git_notes_memory/__init__.py

    "get_capture_service",
    "get_recall_service",
    "get_sync_service",
+    "get_secrets_filtering_service",


The name 'get_secrets_filtering_service' is exported by all but is not defined.

Copilot · 2025-12-26T03:23:03Z

src/git_notes_memory/security/__init__.py

+
+__all__ = [
+    # Factory functions (lazy-loaded)
+    "get_secrets_filtering_service",


The name 'get_secrets_filtering_service' is exported by all but is not defined.

Copilot · 2025-12-26T03:23:03Z

src/git_notes_memory/security/__init__.py

+from __future__ import annotations
+
+__all__ = [
+    # Factory functions (lazy-loaded)
+    "get_secrets_filtering_service",
+    "get_redactor",
+    "get_allowlist_manager",
+    "get_audit_logger",
+    # Enums
+    "SecretType",
+    "FilterStrategy",
+    "FilterAction",
+    # Models
+    "SecretDetection",
+    "FilterResult",
+    "AllowlistEntry",
+    "AuditEntry",
+    # Exceptions
+    "SecretsFilteringError",
+    "BlockedContentError",
+    "AllowlistError",
+    "AuditLogError",
+]
+
+
+def __getattr__(name: str) -> object:


The name 'get_redactor' is exported by all but is not defined.

Suggested change

from __future__ import annotations

__all__ = [

# Factory functions (lazy-loaded)

"get_secrets_filtering_service",

"get_redactor",

"get_allowlist_manager",

"get_audit_logger",

# Enums

"SecretType",

"FilterStrategy",

"FilterAction",

# Models

"SecretDetection",

"FilterResult",

"AllowlistEntry",

"AuditEntry",

# Exceptions

"SecretsFilteringError",

"BlockedContentError",

"AllowlistError",

"AuditLogError",

]

def __getattr__(name: str) -> object:

from __future__ import annotations

from typing import Any

__all__ = [

# Factory functions (lazy-loaded)

"get_secrets_filtering_service",

"get_redactor",

"get_allowlist_manager",

"get_audit_logger",

# Enums

"SecretType",

"FilterStrategy",

"FilterAction",

# Models

"SecretDetection",

"FilterResult",

"AllowlistEntry",

"AuditEntry",

# Exceptions

"SecretsFilteringError",

"BlockedContentError",

"AllowlistError",

"AuditLogError",

]

def get_redactor(*args: Any, **kwargs: Any) -> Any:

"""Return the default redactor instance.

This wrapper exposes ``get_redactor`` as a real module attribute while

delegating to the lazily-loaded implementation resolved via

``__getattr__`` to avoid importing heavy dependencies at import time.

"""

redactor_factory = __getattr__("get_redactor")

return redactor_factory(*args, **kwargs)

def __getattr__(name: str) -> object:

Copilot · 2025-12-26T03:23:03Z

src/git_notes_memory/security/__init__.py

+    # Factory functions (lazy-loaded)
+    "get_secrets_filtering_service",
+    "get_redactor",
+    "get_allowlist_manager",


The name 'get_allowlist_manager' is exported by all but is not defined.

Copilot · 2025-12-26T03:23:06Z

commands/test-secret.md

+import sys
+from git_notes_memory.security import get_secrets_filtering_service
+from git_notes_memory.security.models import FilterAction
+
+# Get the test value - passed as environment or inline
+test_value = '''${TEST_VALUE}'''


User input from the /memory:test-secret command is interpolated directly into a Python -c program via test_value = '''${TEST_VALUE}''', so a crafted value containing ''' and extra Python code can break out of the string literal and execute arbitrary code in the plugin process. An attacker could convince a developer to run /memory:test-secret with a malicious argument (e.g., copied from untrusted documentation), leading to arbitrary Python execution with the plugin's local file and environment access. To fix this, avoid embedding raw arguments into the Python source and instead pass them via sys.argv or environment variables and parse them inside Python, or at minimum serialize/escape the value safely (e.g., using JSON) before interpolation so it cannot alter the program structure.

Suggested change

import sys

from git_notes_memory.security import get_secrets_filtering_service

from git_notes_memory.security.models import FilterAction

# Get the test value - passed as environment or inline

test_value = '''${TEST_VALUE}'''

import os

import sys

from git_notes_memory.security import get_secrets_filtering_service

from git_notes_memory.security.models import FilterAction

# Get the test value from the TEST_VALUE environment variable

test_value = os.environ.get('TEST_VALUE', '')

Copilot · 2025-12-26T03:23:07Z

commands/secrets-allowlist.md

+uv run --directory "$PLUGIN_ROOT" python3 -c "
+import sys
+from git_notes_memory.security import get_allowlist_manager, get_audit_logger
+
+hash_val = '${HASH}'
+namespace = '${NAMESPACE}' or None
+reason = '${REASON}'


The /memory:secrets-allowlist add subcommand embeds user-supplied --hash, --namespace, and --reason values directly into a Python -c string as hash_val = '${HASH}', namespace = '${NAMESPACE}' or None, and reason = '${REASON}', allowing injected quotes and Python code to break out of the string literal and execute. A malicious value such as a crafted --reason could cause arbitrary Python execution when a developer runs the command, compromising local files or credentials. Refactor this to pass arguments into Python via sys.argv or environment variables and treat them as data (or safely serialize/escape them) rather than concatenating them into executable source code.

Suggested change

uv run --directory "$PLUGIN_ROOT" python3 -c "

import sys

from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = '${HASH}'

namespace = '${NAMESPACE}' or None

reason = '${REASON}'

HASH="$HASH" NAMESPACE="$NAMESPACE" REASON="$REASON" uv run --directory "$PLUGIN_ROOT" python3 -c "

import os

import sys

from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = os.environ.get('HASH') or ''

namespace = os.environ.get('NAMESPACE') or None

reason = os.environ.get('REASON') or ''

Copilot · 2025-12-26T03:23:07Z

commands/secrets-allowlist.md

+import sys
+from git_notes_memory.security import get_allowlist_manager, get_audit_logger
+
+hash_val = '${HASH}'


The /memory:secrets-allowlist remove subcommand similarly interpolates the user-controlled --hash value into the Python -c source as hash_val = '${HASH}', so a hash argument containing quotes and code can escape the string literal and run arbitrary Python. This means a crafted command line (for example copied from untrusted documentation) can trigger code execution in the plugin environment. Use safe argument passing (e.g., sys.argv or environment variables) or robust serialization/escaping instead of embedding raw CLI values into Python source.

Suggested change

import sys

from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = '${HASH}'

import os

import sys

from git_notes_memory.security import get_allowlist_manager, get_audit_logger

hash_val = os.environ.get('HASH', '')

Copilot · 2025-12-26T03:23:07Z

commands/scan-secrets.md

+from git_notes_memory import get_recall_service
+from git_notes_memory.security import get_secrets_filtering_service
+from git_notes_memory.security.models import FilterAction
+
+namespace_filter = '${NAMESPACE}' or None


In the first /memory:scan-secrets scan step, the --namespace argument is injected into the Python -c command as namespace_filter = '${NAMESPACE}' or None, which allows a value containing quotes and additional Python code to escape the string and execute in the scanning process. This exposes arbitrary code execution when a developer runs the scan command with a crafted namespace argument. Instead, pass the namespace as data (e.g., via sys.argv or environment) and avoid direct interpolation into the Python program text or ensure it is safely serialized/escaped before use.

Suggested change

from git_notes_memory import get_recall_service

from git_notes_memory.security import get_secrets_filtering_service

from git_notes_memory.security.models import FilterAction

namespace_filter = '${NAMESPACE}' or None

import os

from git_notes_memory import get_recall_service

from git_notes_memory.security import get_secrets_filtering_service

from git_notes_memory.security.models import FilterAction

namespace_filter = os.environ.get('NAMESPACE') or None

Copilot · 2025-12-26T03:23:08Z

commands/scan-secrets.md

+from git_notes_memory import get_recall_service, get_capture_service
+from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger
+from git_notes_memory.index import IndexService
+from git_notes_memory.config import get_project_index_path
+
+namespace_filter = '${NAMESPACE}' or None
+dry_run = '${DRY_RUN}' == 'true'


In the remediation step of /memory:scan-secrets, user-controlled --namespace and --dry-run flags are embedded directly into the Python -c code (namespace_filter = '${NAMESPACE}' or None and dry_run = '${DRY_RUN}' == 'true'), so crafted values containing quotes and Python syntax can break out of the string literal and execute. This again provides an arbitrary code execution path whenever a developer invokes the command with specially formed arguments. Mitigate this by treating these flags purely as data passed via sys.argv/environment and parsed in Python, or by serializing/escaping them safely before interpolation so they cannot alter the code structure.

Suggested change

from git_notes_memory import get_recall_service, get_capture_service

from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger

from git_notes_memory.index import IndexService

from git_notes_memory.config import get_project_index_path

namespace_filter = '${NAMESPACE}' or None

dry_run = '${DRY_RUN}' == 'true'

import os

from git_notes_memory import get_recall_service, get_capture_service

from git_notes_memory.security import get_secrets_filtering_service, get_audit_logger

from git_notes_memory.index import IndexService

from git_notes_memory.config import get_project_index_path

namespace_filter = os.environ.get('NAMESPACE') or None

dry_run = os.environ.get('DRY_RUN', 'false').lower() == 'true'

zircote and others added 13 commits December 25, 2025 19:39

feat(commands): add secrets filtering slash commands

00ec195

- Add /memory:secrets-allowlist for managing known-safe values - Add /memory:scan-secrets for retrospective scanning - Add /memory:audit-log for viewing detection history - Add /memory:test-secret for testing detection patterns

docs: update project documentation with secrets filtering info

5ba99f1

- Add secrets filtering section to README with API examples - Document supported secret types and environment variables - Add SECRETS_FILTER_* env vars to CLAUDE.md - Include slash command reference table

docs: add deep-clean code review artifacts

3baa41e

- Add CODE_REVIEW.md with multi-agent analysis findings - Add REVIEW_SUMMARY.md with executive summary - Add REMEDIATION_TASKS.md with prioritized fix list - Add REMEDIATION_REPORT.md documenting all applied fixes

zircote marked this pull request as ready for review December 26, 2025 03:17

Copilot AI review requested due to automatic review settings December 26, 2025 03:17

Copilot started reviewing on behalf of zircote December 26, 2025 03:18 View session

zircote merged commit c9f358e into main Dec 26, 2025
11 checks passed

zircote deleted the issue-12-secrets-filtering branch December 26, 2025 03:20

Copilot AI reviewed Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: secrets filtering and sensitive data protection (Issue #12)#27

feat: secrets filtering and sensitive data protection (Issue #12)#27
zircote merged 13 commits intomainfrom
issue-12-secrets-filtering

zircote commented Dec 26, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zircote commented Dec 26, 2025

Summary

Approach

Components

Implementation Plan

Specification

Test Plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants