MDBench

Official implementation of paper MDBench: Benchmarking Data-Driven Methods for Model Discovery (AAAI 2026 Oral)

🧭 Overview • 🛠️ 🚀 Usage • 🔬 Datasets • 📊 Methods • 📚 Citation

Overview

MDBench is an open-source benchmarking framework for evaluating model discovery methods on dynamical systems. MDBench assesses 12 algorithms on 14 partial differential equations (PDEs) and 63 ordinary differential equations (ODEs) under varying levels of noise.

Usage

First, run ./install.sh to create a separate conda environment and install the dependencies for each algorithm.

Run ./run.sh --algorithm <algorithm> --data_type <data_type> [--with_noise] to run an algorithm on the set of datasets of type data_type. For example: ./run.sh --algorithm pysr --data_type ode --result_dir results --with_noise. The results will be saved in the results directory. The discovered equations, along with their performance on the test set are stored in a file named <algorithm>-<data_type>.jsonl under the directory <result_dir>.

In order to combine the different resulting json files, issue python scripts/combine_results.py --result_dir result. This will save the aggregated results to combined.csv. The command python scripts/visualize_results.py --results_dir result_temp/ --output_dir result_temp/figs reads the aggregated results file and visualizes the results per data type, dataset, or method.

Datasets

The PDE and ODE datasets are hosted at https://doi.org/10.5281/zenodo.17611099. Datasets are stored in NPZ format, each including the following items:

t: Time points.
u: Observered trajectory. n_dim refers to the number of state variables in a system.

System Type Shape

ODE (n_time, n_dim)

PDE 1D spatial: (n_x, n_time, n_dim)
2D spatial: (n_x, n_y, n_time, n_dim)
3D spatial: (n_x, n_y, n_z, n_time, n_dim)
du: The time derivatives of clean data (data without noise). The shape is the same as u. The true derivatives for ODEs is computed from the true equations. For the PDEs, it is computed via finite difference on the clean observed trajectory.
x (only for 1D, 2D, and 3D spatial PDEs): x coordinates of the grid.
y (only for 2D and 3D spatial PDEs): y coordinates of the grid.
z (only for 3D spatial PDEs): z coordinates of the grid.

ODE

In order to generate ODE datasets, issue python scripts/generate_ode.py --save_dir data/ode/. This script generates trajectories along with true time derivatives of 63 datasets, and stores them in the data/ode folder. It also generates noisy datasets with SNRs in the range 40, 30, 20, 10.

PDE

FEniCS scripts generate PDE datasets in HDF5 and XDMF formats. The script python -m scripts.convert_h5_to_npz --h5_dir H5_DIR --output_dir OUTPUT_DIR converts the file to a format readable by MDBench. In order to preprocess the raw datasets, including derivative estimation, subsampling, and adding noise, issue python -m scripts.unify_data_format --input_dir $(RAW_PDE_DATA_DIR) --output_dir $(PDE_DATA_DIR).

Dataset Name	Source	Equation
Advection	PDEBench	$u_t = -\beta u_x, \beta=0.1$
Burgers	PDEBench	$u_t = -uu_x + \nu u_{xx}, \nu=0.1$
Korteweg-de Vries (KdV)	PDE-Find	$u_t = -6uu_x - u_{xxx}$
Kuramoto-Sivashinsky (KS)	PDE-Find	$u_t = -uu_x-u_{xx} - u_{xxxx}$
Diffusion-Reaction (DR)	PDE-Find	$u_t = 0.1 \nabla^2 u + \lambda(A) u - \omega(A) v$ $v_t = 0.1 \nabla^2 v + \omega(A) u - \lambda(A) v$ $A^2 = u^2 + v^2, \omega(A) = -\beta A^2, \lambda(A) = 1 - A^2$
Nonlinear Schrödinger (NLS)	PDE-Find	$u_t = 0.5 v_{xx} + u^2v + v^3$ $v_t = -0.5 u_{xx} - u v^2 - u^3$
Advection-Diffusion (AD)	DeepMod	$u_t = 0.25 u_{x} + 0.5 u_y + 0.5 u_{yy} + 0.5 u_{xx}$
Heat (Laser)	Abali	$\rho c u_t = \kappa\nabla^2u + \rho Laser(t)$
Heat (Solar) 1D, 2D, 3D	FEniCS '12	$\rho c u_t = \kappa \nabla^2u + f$, uniform $\kappa$

Navier-Stokes Channel	FEniCS '16	$\rho(u_t + u \cdot \nabla u) - \nabla\cdot \sigma(u, p) = f$ $\nabla\cdot u = 0$
Navier-Stokes Cylinder	FEniCS '16	Same as above
Reaction-Diffusion Cylinder	FEniCS '16	$\rho(w_t + w \cdot \nabla w) - \nabla\cdot \sigma(w, p) = f$ $\nabla\cdot w = 0$ ${u_1}_t + w \cdot \nabla u_1 - \nabla\cdot\epsilon\nabla u_1 = f_1 - K u_1 u_2$ ${u_2}_t + w \cdot \nabla u_2 - \nabla\cdot\epsilon\nabla u_2 = f_2 - K u_1 u_2$ ${u_3}_t + w \cdot \nabla u_3 - \nabla\cdot\epsilon\nabla u_3 = f_3 + K u_1 u_2 - K u_3$

Sources

Abali Computational Reality by Abali
ERL 2002 The analysis of the generalized-alpha method for non-linear dynamic problems by Erlicher et al. Code
FEniCS '12 Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book by Logg et al.
FEniCS '16 Solving PDEs in Python: The FEniCS Tutorial I by Langtangen and Logg

Methods

Here is a summary of the algorithms and their descriptions. The algorithms are implemented in the mdbench/algorithms directory.

Method Name	System Type	Source
PDE-FIND	PDE	Paper Github 1 Github 2
SINDy	ODE	Paper Github
WSINDy	PDE	Paper Github 1 Github 2
EWSINDy	ODE/PDE	Paper Github
Bayesian	PDE	Paper Github
DeepMoD	PDE	Paper Github
EQL	ODE/PDE	Paper Github
uDSR	ODE/PDE	Paper Github
PySR	ODE/PDE	Paper Github
Operon	ODE/PDE	Paper Github
ODEformer	ODE	Paper Github
End2End	ODE/PDE	Paper Github

Adding a New Method

In order to add a new algorithm to the pipeline, create a new directory under mdbench/algorithms/ode/ or mdbench/algorithms/pde/ or mdbench/algorithms/sr/ depending on the type of systems the method solves. In the newly created directory, two files are necessary:

environment.yml: Conda environment containing dependencies which are not part of the base packages (requirements.txt in the repo's root directory).
regressor.py: Contains a class named Regressor. Its constructor takes the hyperparameters and the number of parallel jobs (n_jobs) as keyword arguments.
- The space of hyperparameters are defined in a dictionary named hyper_params located outside of the class. The pipeline trains separate models with all the possible hyperparameter combinations and picks the best hyperparameter setting based on the performance on the validation data. The final model is trained on the training and validation data with the chosen hyperparameter setting.
- The parallel execution mechanism acts in two ways: 1) for GP-based methods and the methods that MDBench does not perform hyperparameter optimization, the n_jobs keyword argument is passed to the algorithms' constructor; 2) for other methods, the parallel execution occurs only in hyperparameter tuning phase and not in the training phase.
- The method for discovering PDEs should implement set_spatial_grid(self, s), which fixes the spatial grid over which the functions are evaluated.

The Regressor class should implement the following methods:

fit(t_train, u_train, u_dot_train): trains the model given the observed trajectory and approximated derivatives.
predict(t_test, u_test): predicts the time derivatives given the observed trajectory.
complexity(): complexity of the learned equation which is defined as the total number of variables, operations, and constants in the equation.
to_str(): returns the discovered equations in string and human-readable format.

Citation

If you find our benchmark or dataset useful for your work, consider a ⭐️ and citing us with

@article{bideh2025mdbench,
  title={MDBench: Benchmarking Data-Driven Methods for Model Discovery},
  author={Bideh, Amirmohammad Ziaei and Georgievska, Aleksandra and Gryak, Jonathan},
  journal={arXiv preprint arXiv:2509.20529},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MDBench

Overview

Usage

Datasets

ODE

PDE

Sources

Methods

Adding a New Method

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
mdbench		mdbench
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
run.sh		run.sh

System Type	Shape
ODE	`(n_time, n_dim)`
PDE	1D spatial: `(n_x, n_time, n_dim)` 2D spatial: `(n_x, n_y, n_time, n_dim)` 3D spatial: `(n_x, n_y, n_z, n_time, n_dim)`

License

gryaklab/mdbench

Folders and files

Latest commit

History

Repository files navigation

MDBench

Overview

Usage

Datasets

ODE

PDE

Sources

Methods

Adding a New Method

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages