Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3435410
feat(docs/joss_paper) initiate joss paper
casenave May 30, 2025
1ff1b41
fix(.gitignore) ignore file fenerated by testing suite
casenave May 30, 2025
1d8551b
fix(docs) add matplotlib to readthedocs env (was in muscat but replac…
casenave May 30, 2025
d571c05
docs(docs/joss_paper/*) first version of the joss
casenave May 30, 2025
0d60b0f
feat(actions) add joss pdf compiler
casenave May 30, 2025
b352378
feat(actions) option for not running actions if 'no ci' is in commit …
casenave May 30, 2025
37b7403
fix(joss_paper) correct authors entry in paper.md
casenave May 30, 2025
d457ab0
(fix) joss_paper, update, no ci
casenave May 30, 2025
81a4c7a
feat(docs/joss_paper) improve paper, no ci
casenave May 30, 2025
2c35903
[skip ci] fix(action) remove 'no ci' condition on action
casenave May 31, 2025
48da060
Merge branch 'main' into joss_paper
casenave May 31, 2025
49ca191
feat(joss_paper) reduce paper size (by mainly removing repetitions)
casenave May 31, 2025
47029dd
feat(joss_paper) improve references
casenave May 31, 2025
527011e
Merge branch 'main' into joss_paper
casenave May 31, 2025
5c29d5a
Merge branch 'main' into joss_paper
casenave May 31, 2025
db52259
Merge branch 'main' into joss_paper
casenave May 31, 2025
97555e4
Merge branch 'main' into joss_paper
casenave Jun 1, 2025
b8e57aa
fix(docs/requirements.yml) revert added matplotlib: not needed
casenave Jun 1, 2025
923d2de
(joss paper) add link to rtd doc
xroynard Jun 3, 2025
69cb566
fix(joss_paper/paper.md): small name and orcid updates
TopAgrume Jun 7, 2025
d720d6b
Merge branch 'main' into joss_paper
casenave Jun 7, 2025
ed9f299
fix(joss_paper/paper.bib) correct Raphael Carpintero Perez last name …
casenave Jun 7, 2025
55dda97
fix(joss_paper/paper.md, joss_paper/plaid_architecture.md): small tex…
TopAgrume Jun 7, 2025
7256564
fix(joss_paper/paper.md, joss_paper/plaid_architecture.md): remove du…
TopAgrume Jun 7, 2025
7372d1f
update(joss_paper/plaid_architecture.md): trying new size (80%)
TopAgrume Jun 7, 2025
10a4c20
fix(joss_paper/plaid_architecture.md): force the 'Usage and Applicati…
TopAgrume Jun 7, 2025
377be8f
fix(joss_paper/plaid_architecture.md): remove HTML line break tags an…
TopAgrume Jun 7, 2025
a350c62
fix(joss_paper/plaid_architecture.md): missing period and wrong ref f…
TopAgrume Jun 7, 2025
a7fb48b
feat(joss_paper) minor modifications
casenave Jun 7, 2025
482e7df
fix(joss_paper) solve conflict
casenave Jun 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/draft-pdf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Draft JOSS PDF
on:
push:
paths:
- docs/joss_paper/**
- .github/workflows/draft-pdf.yml

jobs:
paper:
runs-on: ubuntu-latest
name: Paper Draft
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build joss draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
paper-path: docs/joss_paper/paper.md
- name: Upload
uses: actions/upload-artifact@v4
with:
name: paper
path: docs/joss_paper/paper.pdf
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ tests/post/*.png
tests/post/*.yaml
tests/problem_definition/*.csv
examples/**/*.png
tests/problem_definition/split.csv

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
54 changes: 54 additions & 0 deletions docs/joss_paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
@article{casenave2025physics,
title={Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning},
author={Casenave, Fabien and Roynard, Xavier and Staber, Brian and Akkari, Nissrine and Piat, William and Bucci, Michele Alessandro and Kabalan, Abbas and Nguyen, Xuan Minh Vuong and Saverio, Luca and Perez, Rapha{\"e}l Carpintero and others},
journal={arXiv preprint arXiv:2505.02974},
year={2025}
}

@inproceedings{poinot2018seven,
title={Seven keys for practical understanding and use of CGNS},
author={Poinot, Marc and Rumsey, Christopher L},
booktitle={2018 AIAA Aerospace Sciences Meeting},
pages={1503},
year={2018}
}

@article{casenave2024mmgp,
title={{MMGP}: a {M}esh {M}orphing {G}aussian {P}rocess-based machine learning method for regression of physical problems under nonparametrized geometrical variability},
author={Casenave, Fabien and Staber, Brian and Roynard, Xavier},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
@article{kabalan2025elasticity,
title={Elasticity-based morphing technique and application to reduced-order modeling},
author={Kabalan, Abbas and Casenave, Fabien and Bordeu, Felipe and Ehrlacher, Virginie and Ern, Alexandre},
journal={Applied Mathematical Modelling},
volume={141},
pages={115929},
year={2025},
publisher={Elsevier}
}

@article{kabalan2025ommgp,
title={{O-MMGP}: {O}ptimal {M}esh {M}orphing {G}aussian {P}rocess Regression for Solving {PDEs} with non-Parametric Geometric Variations},
author={Kabalan, Abbas and Casenave, Fabien and Bordeu, Felipe and Ehrlacher, Virginie},
journal={arXiv preprint arXiv:2502.11632},
year={2025}
}

@inproceedings{perez2024gaussian,
title={{Gaussian process regression with Sliced Wasserstein Weisfeiler-Lehman graph kernels}},
author={Carpintero Perez, Rapha{\"e}l and Da Veiga, S{\'e}bastien and Garnier, Josselin and Staber, Brian},
booktitle={International Conference on Artificial Intelligence and Statistics},
pages={1297--1305},
year={2024},
organization={PMLR}
}

@article{perez2024learning,
title={{Learning signals defined on graphs with optimal transport and Gaussian process regression}},
author={Carpintero Perez, Rapha{\"e}l and Da Veiga, S{\'e}bastien and Garnier, Josselin and Staber, Brian},
journal={arXiv preprint arXiv:2410.15721},
year={2024}
}
59 changes: 59 additions & 0 deletions docs/joss_paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "PLAID: Physics-Learning AI Datamodel"
tags:
- python
- scientific machine learning
- data model
- physics simulation
date: "07 June 2025"

authors:
- name: Fabien Casenave
orcid: 0000-0002-8810-9128
affiliation: 1
- name: Xavier Roynard
orcid: 0000-0001-7840-2120
affiliation: 1
- name: Alexandre Devaux--Rivière
orcid: 0009-0001-7474-944X
affiliation: 1,2
affiliations:
- name: SafranTech, Safran Tech, Digital Sciences & Technologies, 78114 Magny-Les-Hameaux, France
index: 1
- name: EPITA, 14-16 Rue Voltaire, 94270 Le Kremlin-Bicêtre, France
index: 2
bibliography: paper.bib
---

# Summary

PLAID (Physics-Learning AI Datamodel) is a Python library and data format for representing, storing, and sharing physics simulation datasets for machine learning. Unlike domain-specific formats, PLAID accommodates time-dependent, multi-resolution simulations and heterogeneous meshes. The library provides a high-level API to easily load, inspect, and save data. Beyond basic I/O, PLAID includes utilities for machine-learning workflows. It provides converters to build PLAID datasets from generic tabular data, and a “Hugging Face bridge” to push/pull datasets via the Hugging Face hub. In short, PLAID couples a flexible on-disk standard with a software toolkit to manipulate physics data, addressing the needs of ML researchers in fluid dynamics, structural mechanics, and related fields in a generic fashion. Full documentation, examples and tutorials are available at [plaid-lib.readthedocs.io](https://plaid-lib.readthedocs.io/en/latest/).


# Statement of Need

Machine learning for physical systems often suffers from inconsistent data representations across different domains and simulators. Existing initiatives typically target narrow problems: e.g., separate formats for CFD or for finite-element data, and dedicated scripts to process each new dataset. This fragmentation hinders reproducibility and reuse of high-fidelity data.

PLAID addresses this gap by providing a generic, unified datamodel that can describe many physics simulation data. It leverages the CGNS standard [@poinot2018seven] to capture complex geometry and time evolution: for example, CGNS supports multi-block topologies and evolving meshes, with a data model that separates abstract topology (element families, etc.) from concrete mesh coordinates. On top of CGNS, PLAID layers a lightweight organizational structure.

By promoting a common standard, PLAID makes physics data interoperable across projects. It has already been used to package and publish multiple datasets covering structural mechanics and computational fluid dynamics. These PLAID-formatted datasets (hosted on Zenodo and Hugging Face) have supported ML benchmarks, democratizing access to simulation data.

# Functionality

* **Data Model and Formats:** A PLAID dataset is organized within a root folder (or archive), distinctly separating simulation data from machine learning task definitions, as illustrated in \autoref{fig:plaid_dataset_architecture}. The `dataset/` directory contains numbered sample subfolders (`sample_000...`), each holding one or more `.cgns` files under `meshes/` and a `scalars.csv` file. The `dataset/infos.yaml` file contains human-readable descriptions and metadata. The `problem_definition/` folder provides machine learning context. It includes `problem_infos.yaml` (specifying the ML task inputs/outputs) and `split.csv` (defining train/test splits). This design supports time evolution and multi-block/multi-geometry problems out of the box.

![Overview of the PLAID dataset architecture.\label{fig:plaid_dataset_architecture}](plaid_architecture.png){ width=80% }

* **Supported Data Types:** PLAID handles scalar, time-series and vector field data on meshes, as well as sample-specific metadata. The `get_mesh(time)` method reconstructs the full CGNS tree for a given timestep, with links resolved if requested (thereby returning the complete mesh). Thus PLAID naturally supports mesh-based simulation outputs with arbitrary element types and remeshing between time steps. Heterogeneity is allowed: missing data is supported, and outputs on testing sets may be missing on purpose to facilitate benchmark initiatives.

* **High-Level API:** The top-level `Dataset` class manages multiple `Sample` objects. Users can create an empty `Dataset()` and add samples via `add_sample()`, or load an existing PLAID data archive by calling `Dataset("path_to_plaid_dataset")`. The `Dataset` object summarizes itself (e.g. printing “Dataset(3 samples, 2 scalars, 5 fields)”) and provides access to samples by ID. Batch operations are supported: one can `dataset.add_samples(...)` to append many samples, or use the classmethods `Dataset.load_from_dir()` and `load_from_file()` to load data from disk, with optional parallel workers. This high-level interface abstracts away low-level I/O, letting users focus on ML pipelines.

* **Utilities:** PLAID includes helper modules for common tasks in data science workflows. The `plaid.utils.split` module provides a `split_dataset` function to partition data into training/validation/testing subsets according to user-defined ratios. The `plaid.utils.interpolation` module implements piecewise linear interpolation routines to resample time series fields or align datasets with different timesteps. The `plaid.utils.stats` module offers an `OnlineStatistics` class to compute running statistics (min, mean, variance, etc.) on arrays, which can be used to analyze dataset distributions. Moreover, a “Hugging Face bridge” (`plaid.bridges.huggingface_bridge`) enables converting PLAID datasets to/from Hugging Face Dataset objects.

# Usage and Applications

PLAID is designed for AI/ML researchers and practitioners working with simulation data. Various datasets, including 2D/3D fluid and structural simulations, are provided in PLAID format in [Hugging Face](https://huggingface.co/PLAID-datasets) and [Zenodo](https://zenodo.org/communities/plaid_datasets). Interactive benchmarks are hosted in a [Hugging Face community](https://huggingface.co/PLAIDcompetitions) on these datasets, providing detailed instructions and PLAID commands for data retrieval and manipulation, see [@casenave2025physics]. These datasets are also used in recent publications to illustrate the performance of the proposed scientific ML methods. In [@casenave2024mmgp; @kabalan2025elasticity; @kabalan2025ommgp], Gaussian-process regression methods with mesh morphing are applied to these datasets. In [@perez2024gaussian; @perez2024learning] the datasets are leveraged in graph-kernel regression methods applied to fluid/solid mechanics.

In summary, PLAID provides a comprehensive framework for physics-based ML data. By combining a unified data model, support for advanced mesh features, and helpful utilities, it addresses the need for interoperable, high-fidelity simulation datasets. Future enhancements involve developing general-purpose PyTorch dataloaders compatible with PLAID, along with establishing standardized evaluation metrics and unified pipelines for training and inference using the PLAID framework.

# References
Binary file added docs/joss_paper/plaid_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading