Skip to content

fix: harden prompt injection defenses (M-08, M-09)#5

Open
riaworks wants to merge 1 commit intomainfrom
fix/prompt-injection-defenses
Open

fix: harden prompt injection defenses (M-08, M-09)#5
riaworks wants to merge 1 commit intomainfrom
fix/prompt-injection-defenses

Conversation

@riaworks
Copy link
Owner

@riaworks riaworks commented Mar 1, 2026

Summary

PR 4 of the Security Remediation Plan — hardens prompt injection defenses in mega-brain's hook system.

M-08: Personality File Integrity Verification (session_start.py)

  • SHA-256 hash verification for personality files injected into LLM context
  • Creates baseline integrity manifest on first run (.claude/jarvis/INTEGRITY-MANIFEST.json)
  • On subsequent runs, compares current file hashes against stored baseline
  • Warns but does NOT block on hash mismatch (graceful degradation)
  • Does NOT auto-update manifest when changes detected (preserves security purpose)
  • Files monitored: JARVIS-DNA-PERSONALITY.md, JARVIS-SOUL.md, JARVIS-BOOT-SEQUENCE.md, JARVIS-MEMORY.md

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

  • Path traversal prevention via os.path.normpath() + allowed prefix validation
  • Explicit whitelist (.claude/SKILL-WHITELIST.json) controls which skills can be auto-injected
  • Blocked skills/sub-agents logged to logs/skill-security.jsonl
  • Graceful degradation: if no whitelist file exists, all skills in valid paths are trusted (backward compatible)
  • Whitelist supports: trusted_skills, trusted_subagents, blocked lists, and wildcard (*)

Files Changed

File Change
.claude/hooks/session_start.py +139 lines: integrity verification functions + main() integration
.claude/hooks/skill_router.py +92 lines: whitelist/path security functions + main() integration
.claude/SKILL-WHITELIST.json NEW: whitelist with all 40 current trusted skills

Security Properties

  • Warn-only: Both defenses warn but don't block functionality
  • No new dependencies: Uses only Python stdlib (hashlib, json, os, pathlib)
  • No exec/eval/os.system: Zero dynamic code execution
  • Backward compatible: Existing installations work without whitelist file

OWASP/MITRE Mapping

Finding OWASP LLM MITRE ATLAS CVSS
M-08 LLM02 (Insecure Output Handling) AML.T0051 (Prompt Injection) 5.3
M-09 LLM02 (Insecure Output Handling) AML.T0051 (Prompt Injection) 5.3

Test Plan

  • Verify session_start.py compiles without errors
  • Verify skill_router.py compiles without errors
  • Verify SKILL-WHITELIST.json is valid JSON
  • Test first-run manifest creation (delete INTEGRITY-MANIFEST.json, run session)
  • Test whitelist blocking (remove a skill from trusted list, verify warning)
  • Test path traversal prevention (craft path with ../, verify blocked)
  • Verify no exec(), eval(), os.system() in changed files

🤖 Generated with Claude Code

M-08: Add SHA-256 integrity verification for personality files
injected into LLM context via session_start.py. Creates baseline
manifest on first run and warns on hash mismatch (no auto-update).

M-09: Add whitelist-based skill/sub-agent injection control in
skill_router.py. Prevents unauthorized SKILL.md files from being
auto-injected via path traversal prevention (normpath + prefix check)
and explicit trusted skills whitelist. Logs blocked attempts.

Security: warn-only mode (graceful degradation, no blocking).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant