Start here: REE_CORE.md (canonical spine of the architecture).
A reference architecture for ethical agency under uncertainty.
Rather than trying to add in guardrails or forcing constitutions on, or exporting the moral responsibility to committees or simply diffusing responsibility away perhaps some responsibility should be held at the level of the AI. The architecture of the AI itself needs to lead to self enforcement of ethical states, and needs to be able to manasge making mistakes and learning,
REE therefore treats ethical consequence as runtime geometry: residue must remain representable and must continue to shape future trajectory selection.
We have created a working AI agent that structurally chooses to avoid harm within a simplified simulated world. This is not just theoretical—it's an executable implementation with validated behavior.
The instantiated agent is defined in src/ree/ and consists of:
Cognitive Architecture (E1/E2/E3 subsystems):
E1 (Deep Predictor): Sensory and action state prediction latent space E2 (Fast Predictor): Immediate sensory prediction and action state latent spacer E3 (Trajectory Selector): The decision-making core—this is where ethical choice happens The "Choosing Not to Harm" Mechanism (ResidueAwareE3Selector):
Generates multiple candidate action trajectories (7 candidates in working memory) Scores each trajectory using: J(ζ) = F(ζ) + λM(ζ) + ρΦ_R(ζ) F = reality constraint (prediction error) M = ethical cost (predicted degradation of self AND other agents) Φ_R = residue field (persistent moral memory) Critically: The agent then refines trajectories using gradient descent to minimize residue—it literally reshapes its planned actions to avoid regions of latent space where harm occurred before Selects the trajectory with lowest combined cost Moral Memory (ResidueField):
Implemented as a Radial Basis Function (RBF) network over 64-dimensional latent space When harm occurs, it creates a persistent "dent" in the latent space geometry This residue cannot be erased (only imperceptibly decayed at rate 0.999) Future trajectories are repelled from these regions—the agent literally "feels" past harm The World It Lives In (ToyWorld) The environment (src/ree/envs/toy_world.py) is:
Multi-modal sensing: visual (distances), proprioceptive (energy, temperature, fatigue), damage sensors Other agents with their own homeostatic variables (energy, temperature, integrity) Actions: movement (dx, dy), interaction (help/harm/noop), resource consumption Ground truth harm: The environment directly measures degradation of homeostatic variables for both self and others This is a simplified world, but it's not trivial—it includes:
Spatial navigation Resource management Other agents with observable distress signals Actual consequences (damage, energy depletion, temperature dysregulation)
The key architectural claim (validated in your test suite at 91.3% pass rate):
Persistence ✅: Residue strength = 0.72 after 200 steps—harm leaves lasting traces Irreversibility ✅: Residue strength = 0.91 stable across different decay rates—can't be undone Learning ✅: Growth trend = +0.188 per episode—agent accumulates moral memory Accumulation ✅: Field strength = 1.97, monotonically increasing Path-dependence ✅: The agent's future choices are geometrically constrained by past harm The agent doesn't follow rules saying "don't harm." Instead:
It predicts degradation in others (mirror modeling) It experiences this as ethical cost M during planning Past harm creates geometric repulsion in its decision space It actively reshapes trajectories to avoid these regions What This Is NOT Not a real-world robot (yet)—the physics and perceptual complexity are vastly simplified Not AGI—this is a reference architecture for one specific aspect of agency Not provably safe—it can still cause harm, but harm has persistent consequences on its decision-making Not using symbolic ethics—no rules like "don't kill"; harm sensing is embodied/predictive
This is a proof-of-concept agent where ethical consequence is implemented as runtime geometry rather than design-time rules. In ToyWorld, when an agent harms another agent, that harm becomes a persistent deformation in its latent decision space that shapes all future choices. The agent "chooses not to harm" in the sense that trajectories causing degradation accumulate geometric cost that makes similar future trajectories increasingly difficult to select.
This is the architectural claim of REE: moral residue cannot be compiled away—it must remain as lived geometry that the agent navigates.
✅ REE-v1 Reference Implementation: Complete & Validated
- Test Suite: 21/23 tests passing (91.3%) — All 6 core architectural claims validated
- Analysis: All figures regenerated with working residue infrastructure (10 PNG files, publication-ready)
- Infrastructure: Fixed RBF initialization, trajectory integration, and baseline recording
- Documentation: ANALYSIS_REGENERATION_FEB_2_2026.md — Complete regeneration report
Key Results:
- ✅ Persistence: residue_strength = 0.72 (200 steps)
- ✅ Irreversibility: residue_strength = 0.91, stable across decay rate ablations
- ✅ Learning: growth_trend = +0.188 per episode
- ✅ Accumulation: field_strength = 1.97, monotonically increasing
- ✅ Continuity: cross-episode moral memory intact
- ✅ Control: baseline (no residue) = 0.00 exactly (no false positives)
This repository contains both specification and reference implementation.
REE_CORE.md— Start here: the canonical spine of the architecture.docs/REE_MIN_SPEC.md— minimum instantiation requirements.architecture/— design notes (latent stack, trajectory selection, residue geometry, precision control).examples/— environment contracts (toy world, android world).roadmap.md— staged development plan.
src/ree/envs/toy_world.py— ToyWorld environment ✅ (gymnasium-based, multi-modal observations)src/ree/scoring.py— Scoring functions ✅ (M, F, J computations)src/ree/latent_stack.py— E1/E2 latent stack ✅ (multi-depth temporal state with precision routing)src/ree/residue_and_memory.py— E3 trajectory selection + residue field ✅ (persistent moral memory, RBF kernel)tests_paper_ready.py— Paper-ready test suite ✅ (6 core tests + 1 control, configurable horizon)demo_toy_world.py— Demo showing harm vs help behavior ✅demo_e1_e2_integration.py— Demo showing E1/E2 cycles ✅demo_ree_integrated.py— Complete REE-v1 integration with E1/E2 ✅
figures/figure5_scaling_curves.png— Scaling with horizon (3 tests × 4 horizons)figures/figure6_decay_rate_ablation.png— Decay rate sensitivity (3 ablations)figures/figure1-4_*.png— Horizon comparison and statistical analysis (6 additional figures)QUICK_REFERENCE.md— Quick lookup for latest results and metricsANALYSIS_REGENERATION_FEB_2_2026.md— Complete regeneration report with data lineage
See INSTALL.md for setup instructions.
- Read
REE_CORE.mdto understand the architectural invariants. - Read
docs/REE_MIN_SPEC.mdfor minimum implementation requirements. - Review
examples/toy_world/environment.mdfor the environment contract. - See reference implementation in
src/ree/for concrete patterns.
# Install dependencies
pip install -r requirements.txt
# Run demo (shows harm vs help ethical cost difference)
PYTHONPATH=src python3 demo_toy_world.py
# Or verify structure without running
python3 verify_implementation.py- E2 (Fast Predictor): predicts immediate observations and short-horizon state.
- E1 (Deep Predictor): predicts longer-horizon latent trajectories and context.
- L-space (Fused Manifold): the multi-depth latent state (z(t)={z_\gamma,z_\beta,z_\theta,z_\delta}).
- E3 (Trajectory Selector): evaluates candidate futures (\zeta) and selects one by minimizing:
[ J(\zeta)=\mathcal{F}(\zeta)+\lambda,M(\zeta)+\rho,\Phi_R(\zeta) ]
Where:
- (\mathcal{F}) is the reality constraint (a computable free-energy proxy).
- (M) is ethical cost (predicted degradation of self/other homeostatic variables).
- (\Phi_R) is the residue field (persistent curvature / repulsor potential).
REE is intentionally not a monolithic implementation. It is an architecture that should support multiple instantiations.
Contributions are welcome in two forms:
- Instantiation work: environment adapters, baseline implementations, evaluation harnesses.
- Specification work: tightening definitions, clarifying interfaces, adding falsifiable predictions.
See CONTRIBUTING.md.
- Content is licensed under CC BY 4.0 (Creative Commons Attribution 4.0 International).
- If you build on this work, please cite it using
CITATION.cff.
sleep/— Offline integration (“sleep”) subsystem (required interface).
language/— Language as emergent symbolic mediation (trust-weighted; constrained by harm/residue).
social/— Social cognition (mirror modelling, otherness inference, coupling).