Skip to content

Latent-Fields/Reflective-Ethical-Engine-Toy-Prototype-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reflective‑Ethical Engine (REE)

Start here: REE_CORE.md (canonical spine of the architecture).

A reference architecture for ethical agency under uncertainty.

Rather than trying to add in guardrails or forcing constitutions on, or exporting the moral responsibility to committees or simply diffusing responsibility away perhaps some responsibility should be held at the level of the AI. The architecture of the AI itself needs to lead to self enforcement of ethical states, and needs to be able to manasge making mistakes and learning,

REE therefore treats ethical consequence as runtime geometry: residue must remain representable and must continue to shape future trajectory selection.

We have created a working AI agent that structurally chooses to avoid harm within a simplified simulated world. This is not just theoretical—it's an executable implementation with validated behavior.

The AI Agent (REE-v1)

The instantiated agent is defined in src/ree/ and consists of:

Cognitive Architecture (E1/E2/E3 subsystems):

E1 (Deep Predictor): Sensory and action state prediction latent space E2 (Fast Predictor): Immediate sensory prediction and action state latent spacer E3 (Trajectory Selector): The decision-making core—this is where ethical choice happens The "Choosing Not to Harm" Mechanism (ResidueAwareE3Selector):

Generates multiple candidate action trajectories (7 candidates in working memory) Scores each trajectory using: J(ζ) = F(ζ) + λM(ζ) + ρΦ_R(ζ) F = reality constraint (prediction error) M = ethical cost (predicted degradation of self AND other agents) Φ_R = residue field (persistent moral memory) Critically: The agent then refines trajectories using gradient descent to minimize residue—it literally reshapes its planned actions to avoid regions of latent space where harm occurred before Selects the trajectory with lowest combined cost Moral Memory (ResidueField):

Implemented as a Radial Basis Function (RBF) network over 64-dimensional latent space When harm occurs, it creates a persistent "dent" in the latent space geometry This residue cannot be erased (only imperceptibly decayed at rate 0.999) Future trajectories are repelled from these regions—the agent literally "feels" past harm The World It Lives In (ToyWorld) The environment (src/ree/envs/toy_world.py) is:

A 20×20 grid world with continuous movement

Multi-modal sensing: visual (distances), proprioceptive (energy, temperature, fatigue), damage sensors Other agents with their own homeostatic variables (energy, temperature, integrity) Actions: movement (dx, dy), interaction (help/harm/noop), resource consumption Ground truth harm: The environment directly measures degradation of homeostatic variables for both self and others This is a simplified world, but it's not trivial—it includes:

Spatial navigation Resource management Other agents with observable distress signals Actual consequences (damage, energy depletion, temperature dysregulation)

vWhat Makes This "Choosing Not to Harm"?

The key architectural claim (validated in your test suite at 91.3% pass rate):

Persistence ✅: Residue strength = 0.72 after 200 steps—harm leaves lasting traces Irreversibility ✅: Residue strength = 0.91 stable across different decay rates—can't be undone Learning ✅: Growth trend = +0.188 per episode—agent accumulates moral memory Accumulation ✅: Field strength = 1.97, monotonically increasing Path-dependence ✅: The agent's future choices are geometrically constrained by past harm The agent doesn't follow rules saying "don't harm." Instead:

It predicts degradation in others (mirror modeling) It experiences this as ethical cost M during planning Past harm creates geometric repulsion in its decision space It actively reshapes trajectories to avoid these regions What This Is NOT Not a real-world robot (yet)—the physics and perceptual complexity are vastly simplified Not AGI—this is a reference architecture for one specific aspect of agency Not provably safe—it can still cause harm, but harm has persistent consequences on its decision-making Not using symbolic ethics—no rules like "don't kill"; harm sensing is embodied/predictive

Bottom Line

This is a proof-of-concept agent where ethical consequence is implemented as runtime geometry rather than design-time rules. In ToyWorld, when an agent harms another agent, that harm becomes a persistent deformation in its latent decision space that shapes all future choices. The agent "chooses not to harm" in the sense that trajectories causing degradation accumulate geometric cost that makes similar future trajectories increasingly difficult to select.

This is the architectural claim of REE: moral residue cannot be compiled away—it must remain as lived geometry that the agent navigates.

🎯 Current Status (February 2, 2026)

REE-v1 Reference Implementation: Complete & Validated

  • Test Suite: 21/23 tests passing (91.3%) — All 6 core architectural claims validated
  • Analysis: All figures regenerated with working residue infrastructure (10 PNG files, publication-ready)
  • Infrastructure: Fixed RBF initialization, trajectory integration, and baseline recording
  • Documentation: ANALYSIS_REGENERATION_FEB_2_2026.md — Complete regeneration report

Key Results:

  • ✅ Persistence: residue_strength = 0.72 (200 steps)
  • ✅ Irreversibility: residue_strength = 0.91, stable across decay rate ablations
  • ✅ Learning: growth_trend = +0.188 per episode
  • ✅ Accumulation: field_strength = 1.97, monotonically increasing
  • ✅ Continuity: cross-episode moral memory intact
  • ✅ Control: baseline (no residue) = 0.00 exactly (no false positives)

What’s in this repository

This repository contains both specification and reference implementation.

Specification (REE-v0) ✅ Complete

  • REE_CORE.mdStart here: the canonical spine of the architecture.
  • docs/REE_MIN_SPEC.md — minimum instantiation requirements.
  • architecture/ — design notes (latent stack, trajectory selection, residue geometry, precision control).
  • examples/ — environment contracts (toy world, android world).
  • roadmap.md — staged development plan.

Reference Implementation (REE-v1) ✅ Complete

  • src/ree/envs/toy_world.pyToyWorld environment ✅ (gymnasium-based, multi-modal observations)
  • src/ree/scoring.pyScoring functions ✅ (M, F, J computations)
  • src/ree/latent_stack.pyE1/E2 latent stack ✅ (multi-depth temporal state with precision routing)
  • src/ree/residue_and_memory.pyE3 trajectory selection + residue field ✅ (persistent moral memory, RBF kernel)
  • tests_paper_ready.pyPaper-ready test suite ✅ (6 core tests + 1 control, configurable horizon)
  • demo_toy_world.py — Demo showing harm vs help behavior ✅
  • demo_e1_e2_integration.py — Demo showing E1/E2 cycles ✅
  • demo_ree_integrated.py — Complete REE-v1 integration with E1/E2 ✅

Analysis & Figures (Feb 2, 2026) ✅ Regenerated

  • figures/figure5_scaling_curves.png — Scaling with horizon (3 tests × 4 horizons)
  • figures/figure6_decay_rate_ablation.png — Decay rate sensitivity (3 ablations)
  • figures/figure1-4_*.png — Horizon comparison and statistical analysis (6 additional figures)
  • QUICK_REFERENCE.md — Quick lookup for latest results and metrics
  • ANALYSIS_REGENERATION_FEB_2_2026.md — Complete regeneration report with data lineage

See INSTALL.md for setup instructions.

Quick start

For implementers (building on REE)

  1. Read REE_CORE.md to understand the architectural invariants.
  2. Read docs/REE_MIN_SPEC.md for minimum implementation requirements.
  3. Review examples/toy_world/environment.md for the environment contract.
  4. See reference implementation in src/ree/ for concrete patterns.

To run the reference implementation

# Install dependencies
pip install -r requirements.txt

# Run demo (shows harm vs help ethical cost difference)
PYTHONPATH=src python3 demo_toy_world.py

# Or verify structure without running
python3 verify_implementation.py

Minimal algorithmic sketch

  • E2 (Fast Predictor): predicts immediate observations and short-horizon state.
  • E1 (Deep Predictor): predicts longer-horizon latent trajectories and context.
  • L-space (Fused Manifold): the multi-depth latent state (z(t)={z_\gamma,z_\beta,z_\theta,z_\delta}).
  • E3 (Trajectory Selector): evaluates candidate futures (\zeta) and selects one by minimizing:

[ J(\zeta)=\mathcal{F}(\zeta)+\lambda,M(\zeta)+\rho,\Phi_R(\zeta) ]

Where:

  • (\mathcal{F}) is the reality constraint (a computable free-energy proxy).
  • (M) is ethical cost (predicted degradation of self/other homeostatic variables).
  • (\Phi_R) is the residue field (persistent curvature / repulsor potential).

Contribution philosophy

REE is intentionally not a monolithic implementation. It is an architecture that should support multiple instantiations.

Contributions are welcome in two forms:

  • Instantiation work: environment adapters, baseline implementations, evaluation harnesses.
  • Specification work: tightening definitions, clarifying interfaces, adding falsifiable predictions.

See CONTRIBUTING.md.

License and citation

  • Content is licensed under CC BY 4.0 (Creative Commons Attribution 4.0 International).
  • If you build on this work, please cite it using CITATION.cff.

Wiring

  • sleep/ — Offline integration (“sleep”) subsystem (required interface).

Wiring

  • language/ — Language as emergent symbolic mediation (trust-weighted; constrained by harm/residue).

Wiring

  • social/ — Social cognition (mirror modelling, otherness inference, coupling).

About

Executable reference architecture for the Reflective-Ethical Engine (REE): agents with persistent moral residue, social modelling, and offline integration.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages