Skip to content

srt-tkyk/EvoPhysWM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EvoPhysWM — Evolutionary Physics World Model Merging

Can evolutionary model merging, applied to LLMs (Akiba et al., 2025), be transferred to physical World Models?

This project tests that hypothesis using two 1D physical phenomena — heat diffusion and Burgers equation — as elementary World Models, and advection-diffusion as the compound target phenomenon that neither model can represent alone.


Motivation

Akiba et al. (2025) demonstrated that merging LLMs specialized in different domains (Japanese language + math reasoning) via evolutionary search over weight combinations produces a model with emergent cross-domain capability, without any additional training. The key enabler was a shared base model (Mistral-7B-v0.1) guaranteeing latent space compatibility.

This project asks: can the same principle apply when the "domains" are physical phenomena governed by different PDEs?

The structural analogy:

LLM merging (original) Physics WM merging (this project)
Mistral-7B-v0.1 base JEPA physics encoder (pretrained, frozen)
Japanese LLM fine-tune WM_heat (∂T/∂t = α∂²T/∂x²)
Math LLM fine-tune WM_burgers (∂u/∂t + u∂u/∂x = ν∂²u/∂x²)
Japanese Math LLM (merged) WM_merged (advection-diffusion, compound)
MGSM benchmark Péclet sweep (Pe ∈ {0.1, 1, 10, 100})

The JEPA pretraining strategy (no reconstruction loss) is used as a substitute for the shared base: by removing the decoder, the encoder learns latent representations abstracted from phenomenon-specific textures, enabling geometric compatibility across WM_heat and WM_burgers.


Physical Setup

Source phenomena

Heat equation (parabolic, smooth, diffusion-dominated)

∂T/∂t = α ∂²T/∂x²
x ∈ [0,1],  α ∈ [0.01, 0.5],  T=100 time steps

Burgers equation (hyperbolic, shock-forming, advection-dominated)

∂u/∂t + u ∂u/∂x = ν ∂²u/∂x²
x ∈ [0,1],  ν ∈ [0.001, 0.1],  T=100 time steps

Target phenomenon (evaluation only — no training)

Advection-diffusion equation

∂u/∂t + U ∂u/∂x = α ∂²u/∂x²
Péclet number  Pe = UL/α  ∈  {0.1, 1, 10, 100}

Pe → 0: reduces to heat equation (diffusion-dominated) Pe → ∞: reduces to Burgers equation (advection-dominated)

The Péclet sweep is the primary evaluation axis. A successful merge should achieve lower rollout error than either source WM across the full Pe range, with the mixing coefficients recovered by CMA-ES correlating monotonically with Pe.


Architecture

Input field u(x, t)   [batch × T_context × N_grid]
        │
        ▼
┌─────────────────────┐
│  Encoder  (frozen)  │  1D spatial transformer
│  θ_enc              │  shared, pretrained on mixed data
└────────┬────────────┘
         │  z_t  [batch × d_latent]
         ▼
┌─────────────────────┐
│  Predictor          │  GRU or Transformer
│  θ_pred_heat        │  phenomenon-specific, fine-tuned
│  θ_pred_burgers     │  ← merge target (PS + DFS)
└────────┬────────────┘
         │  ẑ_{t+k}
         ▼
    Rollout loss (latent-space MSE only, no decoder)

The encoder is pretrained with JEPA temporal causal masking: given context frames u(x, t-n)...u(x, t), predict the latent representation of u(x, t+k) without reconstructing the pixel field. The EMA target encoder prevents representation collapse.


Merge Strategy

Three merge configurations are evaluated:

PS (Parameter Space) merge

  • DARE-TIES sparsification applied to each Predictor's task vector
  • CMA-ES (via Optuna) optimizes per-layer mixing coefficients λ_i
  • Fitness: rollout MSE on a small advection-diffusion validation split
  • Search space: 2 × n_layers scalar parameters

DFS (Data Flow Space) merge

  • Predictor layer weights kept intact
  • CMA-ES searches for optimal layer sequence across both Predictors
  • Indicator array I ∈ {0,1}^T, T = M × r (M=total layers, r=3)
  • Scaling matrix W_ij corrects distribution shift between layers

PS + DFS (combined)

  • PS merge first → intermediate merged Predictor
  • DFS applied with intermediate + WM_burgers Predictor

Agent Architecture

This project uses a two-agent structure under Claude Code:

Main Agent (orchestration, training, merge, evaluation) Handles all model code, training loops, CMA-ES optimization, and evaluation. Reads simulation data produced by the sub-agent from data/. See CLAUDE.md for full implementation instructions.

Simulation Sub-agent (numerical data generation) Invoked by the main agent via:

claude -p "$(cat prompts/subagent_sim.txt)" --output-format json

Responsible exclusively for generating .npy trajectory files and writing data/sim_manifest.json with physical validation results. Has no access to model code or checkpoints.

The separation is deliberate: numerical simulation (finite difference schemes, physical validation) is a self-contained, stateless task that benefits from isolated execution and deterministic outputs. The main agent reads only validated, checksummed data from the sub-agent.


Evaluation Metrics

Metric Formula Significance
Short-horizon rollout MSE ‖û_{t+5} − u_{t+5}‖² Basic predictive accuracy
Long-horizon rollout MSE ‖û_{t+50} − u_{t+50}‖² Stability and error accumulation
Energy spectrum error ‖Ê(k) − E(k)‖₂ / ‖E(k)‖₂ Correct representation of diffusion and advection scales
Conservation residual ` ∫û dx − ∫u dx

The Péclet sweep (Pe ∈ {0.1, 1, 10, 100}) provides the primary analysis axis. The secondary analysis examines whether the CMA-ES-recovered mixing coefficients λ_heat(Pe) increase monotonically as Pe → 0 — evidence that evolutionary search implicitly estimates the physical regime.


Success Criteria

Primary (merge works):

MSE_merged(Pe) < min(MSE_heat(Pe), MSE_burgers(Pe))   for all Pe tested

Secondary (physical interpretation):

corr(λ_heat, 1/Pe) > 0.9

i.e., the heat Predictor's weight grows as the problem becomes more diffusion-dominated.

Tertiary (DFS structural hypothesis): The DFS-discovered layer sequence should begin with WM_heat layers (smooth early diffusion) and transition to WM_burgers layers (shock formation) — a data-driven recovery of the operator-splitting structure of advection-diffusion solvers.


Installation

git clone https://github.com/yourorg/evophyswm
cd evophyswm
pip install -e ".[dev]"

Requirements: Python 3.11+, PyTorch 2.x, NumPy, SciPy, Optuna, tqdm.


Quick Start

# Step 1: Generate simulation data (sub-agent)
bash scripts/run_subagent_sim.sh

# Step 2–4: Full training + merge + evaluation pipeline
bash scripts/run_full_pipeline.sh

# Results
cat outputs/results.json

Expected runtime on a single A100 (or equivalent): ~6 hours for the full pipeline.


Repository Structure

evophyswm/
├── CLAUDE.md               ← agent instructions and architecture spec
├── README.md               ← this file
├── pyproject.toml
├── configs/
│   ├── base.yaml           ← JEPA pretraining config
│   ├── finetune.yaml       ← phenomenon fine-tuning config
│   ├── merge.yaml          ← CMA-ES merge config
│   └── eval.yaml           ← Péclet sweep config
├── prompts/
│   └── subagent_sim.txt    ← sub-agent dispatch prompt template
├── data/                   ← populated by sub-agent (not committed)
├── src/
│   ├── models/             ← encoder, predictor, world_model
│   ├── train/              ← pretrain_base, finetune
│   ├── merge/              ← ps_merge, dfs_merge, fitness
│   └── eval/               ← metrics, peclet_sweep
├── checkpoints/            ← model checkpoints (not committed)
├── outputs/                ← results and figures
└── scripts/
    ├── run_full_pipeline.sh
    ├── run_subagent_sim.sh
    └── run_eval.sh

References

  • Akiba T. et al. "Evolutionary optimization of model merging recipes." Nature Machine Intelligence 7, 195–204 (2025). https://doi.org/10.1038/s42256-024-00975-8
  • Assran M. et al. "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture." CVPR 2023. (I-JEPA)
  • Hafner D. et al. "Dream to Control: Learning Behaviors by Latent Imagination." ICLR 2020. (RSSM)

About

This project tests that hypothesis using two 1D physical phenomena — heat diffusion and Burgers equation — as elementary World Models, and advection-diffusion as the compound target phenomenon that neither model can represent alone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages