EvoPhysWM — Evolutionary Physics World Model Merging

Can evolutionary model merging, applied to LLMs (Akiba et al., 2025), be transferred to physical World Models?

This project tests that hypothesis using two 1D physical phenomena — heat diffusion and Burgers equation — as elementary World Models, and advection-diffusion as the compound target phenomenon that neither model can represent alone.

Motivation

Akiba et al. (2025) demonstrated that merging LLMs specialized in different domains (Japanese language + math reasoning) via evolutionary search over weight combinations produces a model with emergent cross-domain capability, without any additional training. The key enabler was a shared base model (Mistral-7B-v0.1) guaranteeing latent space compatibility.

This project asks: can the same principle apply when the "domains" are physical phenomena governed by different PDEs?

The structural analogy:

LLM merging (original)	Physics WM merging (this project)
Mistral-7B-v0.1 base	JEPA physics encoder (pretrained, frozen)
Japanese LLM fine-tune	WM_heat (∂T/∂t = α∂²T/∂x²)
Math LLM fine-tune	WM_burgers (∂u/∂t + u∂u/∂x = ν∂²u/∂x²)
Japanese Math LLM (merged)	WM_merged (advection-diffusion, compound)
MGSM benchmark	Péclet sweep (Pe ∈ {0.1, 1, 10, 100})

The JEPA pretraining strategy (no reconstruction loss) is used as a substitute for the shared base: by removing the decoder, the encoder learns latent representations abstracted from phenomenon-specific textures, enabling geometric compatibility across WM_heat and WM_burgers.

Physical Setup

Source phenomena

Heat equation (parabolic, smooth, diffusion-dominated)

∂T/∂t = α ∂²T/∂x²
x ∈ [0,1],  α ∈ [0.01, 0.5],  T=100 time steps

Burgers equation (hyperbolic, shock-forming, advection-dominated)

∂u/∂t + u ∂u/∂x = ν ∂²u/∂x²
x ∈ [0,1],  ν ∈ [0.001, 0.1],  T=100 time steps

Target phenomenon (evaluation only — no training)

Advection-diffusion equation

∂u/∂t + U ∂u/∂x = α ∂²u/∂x²
Péclet number  Pe = UL/α  ∈  {0.1, 1, 10, 100}

Pe → 0: reduces to heat equation (diffusion-dominated) Pe → ∞: reduces to Burgers equation (advection-dominated)

The Péclet sweep is the primary evaluation axis. A successful merge should achieve lower rollout error than either source WM across the full Pe range, with the mixing coefficients recovered by CMA-ES correlating monotonically with Pe.

Architecture

Input field u(x, t)   [batch × T_context × N_grid]
        │
        ▼
┌─────────────────────┐
│  Encoder  (frozen)  │  1D spatial transformer
│  θ_enc              │  shared, pretrained on mixed data
└────────┬────────────┘
         │  z_t  [batch × d_latent]
         ▼
┌─────────────────────┐
│  Predictor          │  GRU or Transformer
│  θ_pred_heat        │  phenomenon-specific, fine-tuned
│  θ_pred_burgers     │  ← merge target (PS + DFS)
└────────┬────────────┘
         │  ẑ_{t+k}
         ▼
    Rollout loss (latent-space MSE only, no decoder)

The encoder is pretrained with JEPA temporal causal masking: given context frames u(x, t-n)...u(x, t), predict the latent representation of u(x, t+k) without reconstructing the pixel field. The EMA target encoder prevents representation collapse.

Merge Strategy

Three merge configurations are evaluated:

PS (Parameter Space) merge

DARE-TIES sparsification applied to each Predictor's task vector
CMA-ES (via Optuna) optimizes per-layer mixing coefficients λ_i
Fitness: rollout MSE on a small advection-diffusion validation split
Search space: 2 × n_layers scalar parameters

DFS (Data Flow Space) merge

Predictor layer weights kept intact
CMA-ES searches for optimal layer sequence across both Predictors
Indicator array I ∈ {0,1}^T, T = M × r (M=total layers, r=3)
Scaling matrix W_ij corrects distribution shift between layers

PS + DFS (combined)

PS merge first → intermediate merged Predictor
DFS applied with intermediate + WM_burgers Predictor

Agent Architecture

This project uses a two-agent structure under Claude Code:

Main Agent (orchestration, training, merge, evaluation) Handles all model code, training loops, CMA-ES optimization, and evaluation. Reads simulation data produced by the sub-agent from data/. See CLAUDE.md for full implementation instructions.

Simulation Sub-agent (numerical data generation) Invoked by the main agent via:

claude -p "$(cat prompts/subagent_sim.txt)" --output-format json

Responsible exclusively for generating .npy trajectory files and writing data/sim_manifest.json with physical validation results. Has no access to model code or checkpoints.

The separation is deliberate: numerical simulation (finite difference schemes, physical validation) is a self-contained, stateless task that benefits from isolated execution and deterministic outputs. The main agent reads only validated, checksummed data from the sub-agent.

Evaluation Metrics

Metric	Formula	Significance
Short-horizon rollout MSE	`‖û_{t+5} − u_{t+5}‖²`	Basic predictive accuracy
Long-horizon rollout MSE	`‖û_{t+50} − u_{t+50}‖²`	Stability and error accumulation
Energy spectrum error	`‖Ê(k) − E(k)‖₂ / ‖E(k)‖₂`	Correct representation of diffusion and advection scales
Conservation residual	`	∫û dx − ∫u dx

The Péclet sweep (Pe ∈ {0.1, 1, 10, 100}) provides the primary analysis axis. The secondary analysis examines whether the CMA-ES-recovered mixing coefficients λ_heat(Pe) increase monotonically as Pe → 0 — evidence that evolutionary search implicitly estimates the physical regime.

Success Criteria

Primary (merge works):

MSE_merged(Pe) < min(MSE_heat(Pe), MSE_burgers(Pe))   for all Pe tested

Secondary (physical interpretation):

corr(λ_heat, 1/Pe) > 0.9

i.e., the heat Predictor's weight grows as the problem becomes more diffusion-dominated.

Tertiary (DFS structural hypothesis): The DFS-discovered layer sequence should begin with WM_heat layers (smooth early diffusion) and transition to WM_burgers layers (shock formation) — a data-driven recovery of the operator-splitting structure of advection-diffusion solvers.

Installation

git clone https://github.com/yourorg/evophyswm
cd evophyswm
pip install -e ".[dev]"

Requirements: Python 3.11+, PyTorch 2.x, NumPy, SciPy, Optuna, tqdm.

Quick Start

# Step 1: Generate simulation data (sub-agent)
bash scripts/run_subagent_sim.sh

# Step 2–4: Full training + merge + evaluation pipeline
bash scripts/run_full_pipeline.sh

# Results
cat outputs/results.json

Expected runtime on a single A100 (or equivalent): ~6 hours for the full pipeline.

Repository Structure

evophyswm/
├── CLAUDE.md               ← agent instructions and architecture spec
├── README.md               ← this file
├── pyproject.toml
├── configs/
│   ├── base.yaml           ← JEPA pretraining config
│   ├── finetune.yaml       ← phenomenon fine-tuning config
│   ├── merge.yaml          ← CMA-ES merge config
│   └── eval.yaml           ← Péclet sweep config
├── prompts/
│   └── subagent_sim.txt    ← sub-agent dispatch prompt template
├── data/                   ← populated by sub-agent (not committed)
├── src/
│   ├── models/             ← encoder, predictor, world_model
│   ├── train/              ← pretrain_base, finetune
│   ├── merge/              ← ps_merge, dfs_merge, fitness
│   └── eval/               ← metrics, peclet_sweep
├── checkpoints/            ← model checkpoints (not committed)
├── outputs/                ← results and figures
└── scripts/
    ├── run_full_pipeline.sh
    ├── run_subagent_sim.sh
    └── run_eval.sh

References

Akiba T. et al. "Evolutionary optimization of model merging recipes." Nature Machine Intelligence 7, 195–204 (2025). https://doi.org/10.1038/s42256-024-00975-8
Assran M. et al. "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture." CVPR 2023. (I-JEPA)
Hafner D. et al. "Dream to Control: Learning Behaviors by Latent Imagination." ICLR 2020. (RSSM)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvoPhysWM — Evolutionary Physics World Model Merging

Motivation

Physical Setup

Source phenomena

Target phenomenon (evaluation only — no training)

Architecture

Merge Strategy

Agent Architecture

Evaluation Metrics

Success Criteria

Installation

Quick Start

Repository Structure

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
prompts		prompts
tests		tests
APACHE_LICENSE		APACHE_LICENSE
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

EvoPhysWM — Evolutionary Physics World Model Merging

Motivation

Physical Setup

Source phenomena

Target phenomenon (evaluation only — no training)

Architecture

Merge Strategy

Agent Architecture

Evaluation Metrics

Success Criteria

Installation

Quick Start

Repository Structure

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages