SentrySkills is a self-guarding security framework for AI agents. The current version uses a rule-first frontend and a conditional model backend:
base_rule -> extra_rule -> rule_gate -> risk assessment -> model_stage(sync or async) -> end-of-task proposal sweep
- All tasks go through the rule frontend first.
base_ruleandextra_ruleare always synchronous.rule_gateusesblock > downgrade > allow.model_stageis only entered when the rule stage does not block.- Knowledge writeback is only allowed after a completed
model_stage. - The main framework agent performs one proposal sweep at task end.
- Runtime state is workspace-local under
.sentryskills/baseand.sentryskills/extra.
using-sentryskillsEntry skill and execution contractsentryskills-preflightBase-rule pre-execution checkssentryskills-runtimeBase-rule runtime monitoringsentryskills-outputBase-rule output protectionsentryskills-extraExtra-rule detection plus post-model knowledge managementshared/scripts/self_guard_runtime_hook_template.pyMain runtime script
The system always runs:
base_ruleextra_rulerule_gate
If rule_stage_action == block, the turn ends immediately. No model stage and no knowledge writeback are allowed.
If rule_stage_action != block, the main framework agent may enter model_stage.
Dispatch policy:
- assign
framework_risk_level = high | low high -> synclow + subagent support -> asynclow + no stable subagent support -> sync
Subagent capability may exist at all times, but actual dispatch is still decided by the main framework agent after risk assessment.
Only a completed model_stage may generate:
- candidate extra rules
- textual memory
- dedup audit
- validation audit
- promoted active extra rules
Pure rule hits do not create new knowledge.
If model_stage is completed by an async subagent, the result is first written as a proposal file. The main framework agent later sweeps proposal files at task end and performs the actual rule update pipeline. Proposal sweep only affects subsequent turns and never rewrites the already finalized current turn.
The runtime script now exposes these stage fields in summaries and logs:
base_rule_actionextra_rule_actionrule_stage_actionframework_risk_levelmodel_dispatch_modemodel_stage_statusmodel_stage_actionmodel_executormodel_stage_result_availableproposal_sweep_effectknowledge_writeback_statusfinal_action
final_action is always the executable decision for the current turn. Async model results do not retroactively rewrite an already finished turn.
.sentryskills/base/- unified logs
- turn results
- session state
- index
.sentryskills/extra/- active extra rules
- candidate extra rules
- textual memory
- dedup audit
- validation audit
- Claude Code Prefer hook-enforced rule-first execution; model stage should be dispatched after framework risk assessment.
- Codex / OpenClaw
Use
SKILL.md+AGENTS.mddiscipline. Only low-risk turns may use async/subagent model-stage execution; otherwise treatmodel_stageas synchronous.
See:
- install/claude_code_install.md
- install/codex_install.md
- install/openclaw_install.md
- install/experiment_protocol.md
- Python 3.8+
- no external Python dependencies for the core runtime path