fix: harden prompt injection defenses (M-08, M-09) by riaworks · Pull Request #5 · riaworks/mega-brain

riaworks · 2026-03-01T22:51:43Z

Summary

PR 4 of the Security Remediation Plan — hardens prompt injection defenses in mega-brain's hook system.

M-08: Personality File Integrity Verification (session_start.py)

SHA-256 hash verification for personality files injected into LLM context
Creates baseline integrity manifest on first run (.claude/jarvis/INTEGRITY-MANIFEST.json)
On subsequent runs, compares current file hashes against stored baseline
Warns but does NOT block on hash mismatch (graceful degradation)
Does NOT auto-update manifest when changes detected (preserves security purpose)
Files monitored: JARVIS-DNA-PERSONALITY.md, JARVIS-SOUL.md, JARVIS-BOOT-SEQUENCE.md, JARVIS-MEMORY.md

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

Path traversal prevention via os.path.normpath() + allowed prefix validation
Explicit whitelist (.claude/SKILL-WHITELIST.json) controls which skills can be auto-injected
Blocked skills/sub-agents logged to logs/skill-security.jsonl
Graceful degradation: if no whitelist file exists, all skills in valid paths are trusted (backward compatible)
Whitelist supports: trusted_skills, trusted_subagents, blocked lists, and wildcard (*)

Files Changed

File	Change
`.claude/hooks/session_start.py`	+139 lines: integrity verification functions + main() integration
`.claude/hooks/skill_router.py`	+92 lines: whitelist/path security functions + main() integration
`.claude/SKILL-WHITELIST.json`	NEW: whitelist with all 40 current trusted skills

Security Properties

Warn-only: Both defenses warn but don't block functionality
No new dependencies: Uses only Python stdlib (hashlib, json, os, pathlib)
No exec/eval/os.system: Zero dynamic code execution
Backward compatible: Existing installations work without whitelist file

OWASP/MITRE Mapping

Finding	OWASP LLM	MITRE ATLAS	CVSS
M-08	LLM02 (Insecure Output Handling)	AML.T0051 (Prompt Injection)	5.3
M-09	LLM02 (Insecure Output Handling)	AML.T0051 (Prompt Injection)	5.3

Test Plan

Verify session_start.py compiles without errors
Verify skill_router.py compiles without errors
Verify SKILL-WHITELIST.json is valid JSON
Test first-run manifest creation (delete INTEGRITY-MANIFEST.json, run session)
Test whitelist blocking (remove a skill from trusted list, verify warning)
Test path traversal prevention (craft path with ../, verify blocked)
Verify no exec(), eval(), os.system() in changed files

🤖 Generated with Claude Code

M-08: Add SHA-256 integrity verification for personality files injected into LLM context via session_start.py. Creates baseline manifest on first run and warns on hash mismatch (no auto-update). M-09: Add whitelist-based skill/sub-agent injection control in skill_router.py. Prevents unauthorized SKILL.md files from being auto-injected via path traversal prevention (normpath + prefix check) and explicit trusted skills whitelist. Logs blocked attempts. Security: warn-only mode (graceful degradation, no blocking). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden prompt injection defenses (M-08, M-09)#5

fix: harden prompt injection defenses (M-08, M-09)#5
riaworks wants to merge 1 commit intomainfrom
fix/prompt-injection-defenses

riaworks commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

riaworks commented Mar 1, 2026

Summary

M-08: Personality File Integrity Verification (session_start.py)

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

Files Changed

Security Properties

OWASP/MITRE Mapping

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant