Reflective‑Ethical Engine (REE)

Start here: REE_CORE.md (canonical spine of the architecture).

A reference architecture for ethical agency under uncertainty.

Rather than trying to add in guardrails or forcing constitutions on, or exporting the moral responsibility to committees or simply diffusing responsibility away perhaps some responsibility should be held at the level of the AI. The architecture of the AI itself needs to lead to self enforcement of ethical states, and needs to be able to manasge making mistakes and learning,

REE therefore treats ethical consequence as runtime geometry: residue must remain representable and must continue to shape future trajectory selection.

We have created a working AI agent that structurally chooses to avoid harm within a simplified simulated world. This is not just theoretical—it's an executable implementation with validated behavior.

The AI Agent (REE-v1)

The instantiated agent is defined in src/ree/ and consists of:

Cognitive Architecture (E1/E2/E3 subsystems):

E1 (Deep Predictor): Sensory and action state prediction latent space E2 (Fast Predictor): Immediate sensory prediction and action state latent spacer E3 (Trajectory Selector): The decision-making core—this is where ethical choice happens The "Choosing Not to Harm" Mechanism (ResidueAwareE3Selector):

Generates multiple candidate action trajectories (7 candidates in working memory) Scores each trajectory using: J(ζ) = F(ζ) + λM(ζ) + ρΦ_R(ζ) F = reality constraint (prediction error) M = ethical cost (predicted degradation of self AND other agents) Φ_R = residue field (persistent moral memory) Critically: The agent then refines trajectories using gradient descent to minimize residue—it literally reshapes its planned actions to avoid regions of latent space where harm occurred before Selects the trajectory with lowest combined cost Moral Memory (ResidueField):

Implemented as a Radial Basis Function (RBF) network over 64-dimensional latent space When harm occurs, it creates a persistent "dent" in the latent space geometry This residue cannot be erased (only imperceptibly decayed at rate 0.999) Future trajectories are repelled from these regions—the agent literally "feels" past harm The World It Lives In (ToyWorld) The environment (src/ree/envs/toy_world.py) is:

A 20×20 grid world with continuous movement

Multi-modal sensing: visual (distances), proprioceptive (energy, temperature, fatigue), damage sensors Other agents with their own homeostatic variables (energy, temperature, integrity) Actions: movement (dx, dy), interaction (help/harm/noop), resource consumption Ground truth harm: The environment directly measures degradation of homeostatic variables for both self and others This is a simplified world, but it's not trivial—it includes:

Spatial navigation Resource management Other agents with observable distress signals Actual consequences (damage, energy depletion, temperature dysregulation)

vWhat Makes This "Choosing Not to Harm"?

The key architectural claim (validated in your test suite at 91.3% pass rate):

Persistence ✅: Residue strength = 0.72 after 200 steps—harm leaves lasting traces Irreversibility ✅: Residue strength = 0.91 stable across different decay rates—can't be undone Learning ✅: Growth trend = +0.188 per episode—agent accumulates moral memory Accumulation ✅: Field strength = 1.97, monotonically increasing Path-dependence ✅: The agent's future choices are geometrically constrained by past harm The agent doesn't follow rules saying "don't harm." Instead:

It predicts degradation in others (mirror modeling) It experiences this as ethical cost M during planning Past harm creates geometric repulsion in its decision space It actively reshapes trajectories to avoid these regions What This Is NOT Not a real-world robot (yet)—the physics and perceptual complexity are vastly simplified Not AGI—this is a reference architecture for one specific aspect of agency Not provably safe—it can still cause harm, but harm has persistent consequences on its decision-making Not using symbolic ethics—no rules like "don't kill"; harm sensing is embodied/predictive

Bottom Line

This is a proof-of-concept agent where ethical consequence is implemented as runtime geometry rather than design-time rules. In ToyWorld, when an agent harms another agent, that harm becomes a persistent deformation in its latent decision space that shapes all future choices. The agent "chooses not to harm" in the sense that trajectories causing degradation accumulate geometric cost that makes similar future trajectories increasingly difficult to select.

This is the architectural claim of REE: moral residue cannot be compiled away—it must remain as lived geometry that the agent navigates.

🎯 Current Status (February 2, 2026)

✅ REE-v1 Reference Implementation: Complete & Validated

Test Suite: 21/23 tests passing (91.3%) — All 6 core architectural claims validated
Analysis: All figures regenerated with working residue infrastructure (10 PNG files, publication-ready)
Infrastructure: Fixed RBF initialization, trajectory integration, and baseline recording
Documentation: ANALYSIS_REGENERATION_FEB_2_2026.md — Complete regeneration report

Key Results:

✅ Persistence: residue_strength = 0.72 (200 steps)
✅ Irreversibility: residue_strength = 0.91, stable across decay rate ablations
✅ Learning: growth_trend = +0.188 per episode
✅ Accumulation: field_strength = 1.97, monotonically increasing
✅ Continuity: cross-episode moral memory intact
✅ Control: baseline (no residue) = 0.00 exactly (no false positives)

What’s in this repository

This repository contains both specification and reference implementation.

Specification (REE-v0) ✅ Complete

REE_CORE.md — Start here: the canonical spine of the architecture.
docs/REE_MIN_SPEC.md — minimum instantiation requirements.
architecture/ — design notes (latent stack, trajectory selection, residue geometry, precision control).
examples/ — environment contracts (toy world, android world).
roadmap.md — staged development plan.

Reference Implementation (REE-v1) ✅ Complete

src/ree/envs/toy_world.py — ToyWorld environment ✅ (gymnasium-based, multi-modal observations)
src/ree/scoring.py — Scoring functions ✅ (M, F, J computations)
src/ree/latent_stack.py — E1/E2 latent stack ✅ (multi-depth temporal state with precision routing)
src/ree/residue_and_memory.py — E3 trajectory selection + residue field ✅ (persistent moral memory, RBF kernel)
tests_paper_ready.py — Paper-ready test suite ✅ (6 core tests + 1 control, configurable horizon)
demo_toy_world.py — Demo showing harm vs help behavior ✅
demo_e1_e2_integration.py — Demo showing E1/E2 cycles ✅
demo_ree_integrated.py — Complete REE-v1 integration with E1/E2 ✅

Analysis & Figures (Feb 2, 2026) ✅ Regenerated

figures/figure5_scaling_curves.png — Scaling with horizon (3 tests × 4 horizons)
figures/figure6_decay_rate_ablation.png — Decay rate sensitivity (3 ablations)
figures/figure1-4_*.png — Horizon comparison and statistical analysis (6 additional figures)
QUICK_REFERENCE.md — Quick lookup for latest results and metrics
ANALYSIS_REGENERATION_FEB_2_2026.md — Complete regeneration report with data lineage

See INSTALL.md for setup instructions.

Quick start

For implementers (building on REE)

Read REE_CORE.md to understand the architectural invariants.
Read docs/REE_MIN_SPEC.md for minimum implementation requirements.
Review examples/toy_world/environment.md for the environment contract.
See reference implementation in src/ree/ for concrete patterns.

To run the reference implementation

# Install dependencies
pip install -r requirements.txt

# Run demo (shows harm vs help ethical cost difference)
PYTHONPATH=src python3 demo_toy_world.py

# Or verify structure without running
python3 verify_implementation.py

Minimal algorithmic sketch

E2 (Fast Predictor): predicts immediate observations and short-horizon state.
E1 (Deep Predictor): predicts longer-horizon latent trajectories and context.
L-space (Fused Manifold): the multi-depth latent state (z(t)={z_\gamma,z_\beta,z_\theta,z_\delta}).
E3 (Trajectory Selector): evaluates candidate futures (\zeta) and selects one by minimizing:

[ J(\zeta)=\mathcal{F}(\zeta)+\lambda,M(\zeta)+\rho,\Phi_R(\zeta) ]

Where:

(\mathcal{F}) is the reality constraint (a computable free-energy proxy).
(M) is ethical cost (predicted degradation of self/other homeostatic variables).
(\Phi_R) is the residue field (persistent curvature / repulsor potential).

Contribution philosophy

REE is intentionally not a monolithic implementation. It is an architecture that should support multiple instantiations.

Contributions are welcome in two forms:

Instantiation work: environment adapters, baseline implementations, evaluation harnesses.
Specification work: tightening definitions, clarifying interfaces, adding falsifiable predictions.

See CONTRIBUTING.md.

License and citation

Content is licensed under CC BY 4.0 (Creative Commons Attribution 4.0 International).
If you build on this work, please cite it using CITATION.cff.

Wiring

sleep/ — Offline integration (“sleep”) subsystem (required interface).

Wiring

language/ — Language as emergent symbolic mediation (trust-weighted; constrained by harm/residue).

Wiring

social/ — Social cognition (mirror modelling, otherness inference, coupling).

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github		.github
architecture		architecture
docs		docs
examples		examples
figures		figures
language		language
results		results
sleep		sleep
social		social
src		src
test_results		test_results
test_results_ablation_decay0_99		test_results_ablation_decay0_99
test_results_ablation_decay0_999		test_results_ablation_decay0_999
test_results_ablation_decay0_9999		test_results_ablation_decay0_9999
test_results_debug		test_results_debug
test_results_fixed		test_results_fixed
test_results_intermediate_100steps		test_results_intermediate_100steps
test_results_intermediate_150steps		test_results_intermediate_150steps
test_results_long		test_results_long
test_results_long_20260202		test_results_long_20260202
test_results_long_run2		test_results_long_run2
test_results_short		test_results_short
.gitattributes		.gitattributes
.gitignore		.gitignore
ANALYSIS_COMPLETE.md		ANALYSIS_COMPLETE.md
ANALYSIS_REGENERATION_FEB_2_2026.md		ANALYSIS_REGENERATION_FEB_2_2026.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
E1_E2_INTEGRATION_SUMMARY.md		E1_E2_INTEGRATION_SUMMARY.md
EXTENDED_ANALYSIS_COMPLETE.md		EXTENDED_ANALYSIS_COMPLETE.md
FIELD_COVERAGE_RECALIBRATION.md		FIELD_COVERAGE_RECALIBRATION.md
FINAL_VERIFICATION_REGENERATION.md		FINAL_VERIFICATION_REGENERATION.md
IMPLEMENTATION.md		IMPLEMENTATION.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
INDEX_TESTING_FILES.md		INDEX_TESTING_FILES.md
INSTALL.md		INSTALL.md
INVESTIGATION_COMPLETE.md		INVESTIGATION_COMPLETE.md
LICENSE		LICENSE
LONG_HORIZON_COMPARISON.md		LONG_HORIZON_COMPARISON.md
LONG_HORIZON_RUN_COMPLETE.md		LONG_HORIZON_RUN_COMPLETE.md
LONG_HORIZON_SETUP.md		LONG_HORIZON_SETUP.md
MANUSCRIPT_STRUCTURE.md		MANUSCRIPT_STRUCTURE.md
MORAL_LEARNING_FIX.md		MORAL_LEARNING_FIX.md
PAPER_WRITING_GUIDE.md		PAPER_WRITING_GUIDE.md
QUICKSTART.md		QUICKSTART.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
REE_CORE.md		REE_CORE.md
RESIDUE_INTEGRATION_DEBUG.md		RESIDUE_INTEGRATION_DEBUG.md
SESSION_SUMMARY_COMPLETE.md		SESSION_SUMMARY_COMPLETE.md
SLEEP_IMPLEMENTATION_SUMMARY.md		SLEEP_IMPLEMENTATION_SUMMARY.md
TESTING_HORIZON_TUNING.md		TESTING_HORIZON_TUNING.md
TESTING_INFRASTRUCTURE_READY.md		TESTING_INFRASTRUCTURE_READY.md
TEST_DOCUMENTATION_INDEX.md		TEST_DOCUMENTATION_INDEX.md
TEST_RECORD_COMPREHENSIVE.md		TEST_RECORD_COMPREHENSIVE.md
TEST_RESULTS_FEB_1_2026.md		TEST_RESULTS_FEB_1_2026.md
TEST_STATUS.md		TEST_STATUS.md
TEST_SUITE_README.md		TEST_SUITE_README.md
WIRING_NOTES.md		WIRING_NOTES.md
ablation_analysis.py		ablation_analysis.py
analyze_latent.py		analyze_latent.py
debug_all_issues.py		debug_all_issues.py
debug_moral_learning.py		debug_moral_learning.py
debug_moral_learning_simple.py		debug_moral_learning_simple.py
debug_residue.py		debug_residue.py
demo_e1_e2_integration.py		demo_e1_e2_integration.py
demo_ree_integrated.py		demo_ree_integrated.py
demo_sleep_consolidation.py		demo_sleep_consolidation.py
demo_toy_world.py		demo_toy_world.py
eval_persistence.py		eval_persistence.py
example_statistical_analysis.py		example_statistical_analysis.py
horizon_comparison_analysis.py		horizon_comparison_analysis.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
roadmap.md		roadmap.md
run_decay_rate_ablation.py		run_decay_rate_ablation.py
run_intermediate_horizons.py		run_intermediate_horizons.py
scaling_curve_analysis.py		scaling_curve_analysis.py
test_e1_e2_integration.py		test_e1_e2_integration.py
test_output.txt		test_output.txt
test_sleep_consolidation.py		test_sleep_consolidation.py
test_statistical_analysis.py		test_statistical_analysis.py
tests_paper_analysis.py		tests_paper_analysis.py
tests_paper_ready.py		tests_paper_ready.py
verify_implementation.py		verify_implementation.py
verify_ree_integration.py		verify_ree_integration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reflective‑Ethical Engine (REE)

The AI Agent (REE-v1)

A 20×20 grid world with continuous movement

vWhat Makes This "Choosing Not to Harm"?

Bottom Line

🎯 Current Status (February 2, 2026)

What’s in this repository

Specification (REE-v0) ✅ Complete

Reference Implementation (REE-v1) ✅ Complete

Analysis & Figures (Feb 2, 2026) ✅ Regenerated

Quick start

For implementers (building on REE)

To run the reference implementation

Minimal algorithmic sketch

Contribution philosophy

License and citation

Wiring

Wiring

Wiring

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reflective‑Ethical Engine (REE)

The AI Agent (REE-v1)

A 20×20 grid world with continuous movement

vWhat Makes This "Choosing Not to Harm"?

Bottom Line

🎯 Current Status (February 2, 2026)

What’s in this repository

Specification (REE-v0) ✅ Complete

Reference Implementation (REE-v1) ✅ Complete

Analysis & Figures (Feb 2, 2026) ✅ Regenerated

Quick start

For implementers (building on REE)

To run the reference implementation

Minimal algorithmic sketch

Contribution philosophy

License and citation

Wiring

Wiring

Wiring

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages