From e32a2a55838ee61da302b47321392e95ed2c319f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 15 Oct 2025 21:39:05 +0000 Subject: [PATCH 1/2] Initial plan From 0fc6e5d21422c8e7bae61582e1db54f94c365813 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 15 Oct 2025 21:47:40 +0000 Subject: [PATCH 2/2] Add comprehensive README files for all directories Co-authored-by: KYANJO <56065060+KYANJO@users.noreply.github.com> --- applications/README.md | 38 +++++ applications/flowline_model/README.md | 45 +++++ applications/icepack_model/README.md | 45 +++++ applications/icepack_model/examples/README.md | 40 +++++ .../icepack_model/icepack_utils/README.md | 35 ++++ applications/issm_model/README.md | 47 ++++++ applications/issm_model/examples/README.md | 40 +++++ applications/issm_model/issm_utils/README.md | 40 +++++ applications/lorenz_model/README.md | 60 +++++++ applications/lorenz_model/examples/README.md | 48 ++++++ .../lorenz_model/lorenz_utils/README.md | 42 +++++ config/spack_env/README.md | 83 ++++++++++ config/spack_env/h5py-env/README.md | 120 ++++++++++++++ scripts/README.md | 31 ++++ scripts/data_management/README.md | 69 ++++++++ scripts/matlab/README.md | 42 +++++ scripts/plotting/README.md | 67 ++++++++ scripts/slurm/README.md | 76 +++++++++ src/EnKF/README.md | 85 ++++++++++ src/EnKF/cython_enkf/README.md | 108 ++++++++++++ src/EnKF/python_enkf/README.md | 85 ++++++++++ src/README.md | 41 +++++ src/parallelization/README.md | 117 +++++++++++++ src/parallelization/parallel_mpi/README.md | 67 ++++++++ src/run_model_da/README.md | 154 ++++++++++++++++++ src/tests/flowline_enkf_py/README.md | 74 +++++++++ src/tests/flowline_enkf_py_jl/README.md | 79 +++++++++ src/tests/parallel_mpi/README.md | 124 ++++++++++++++ src/tests/zarr_setup/README.md | 143 ++++++++++++++++ src/utils/README.md | 111 +++++++++++++ 30 files changed, 2156 insertions(+) create mode 100644 applications/README.md create mode 100644 applications/flowline_model/README.md create mode 100644 applications/icepack_model/README.md create mode 100644 applications/icepack_model/examples/README.md create mode 100644 applications/icepack_model/icepack_utils/README.md create mode 100644 applications/issm_model/README.md create mode 100644 applications/issm_model/examples/README.md create mode 100644 applications/issm_model/issm_utils/README.md create mode 100644 applications/lorenz_model/README.md create mode 100644 applications/lorenz_model/examples/README.md create mode 100644 applications/lorenz_model/lorenz_utils/README.md create mode 100644 config/spack_env/README.md create mode 100644 config/spack_env/h5py-env/README.md create mode 100644 scripts/README.md create mode 100644 scripts/data_management/README.md create mode 100644 scripts/matlab/README.md create mode 100644 scripts/plotting/README.md create mode 100644 scripts/slurm/README.md create mode 100644 src/EnKF/README.md create mode 100644 src/EnKF/cython_enkf/README.md create mode 100644 src/EnKF/python_enkf/README.md create mode 100644 src/README.md create mode 100644 src/parallelization/README.md create mode 100644 src/parallelization/parallel_mpi/README.md create mode 100644 src/run_model_da/README.md create mode 100644 src/tests/flowline_enkf_py/README.md create mode 100644 src/tests/flowline_enkf_py_jl/README.md create mode 100644 src/tests/parallel_mpi/README.md create mode 100644 src/tests/zarr_setup/README.md create mode 100644 src/utils/README.md diff --git a/applications/README.md b/applications/README.md new file mode 100644 index 0000000..4918cbe --- /dev/null +++ b/applications/README.md @@ -0,0 +1,38 @@ +# Applications + +This directory contains model-specific implementations and examples for the ICESEE data assimilation framework. + +## Supported Models + +### Flowline Model +A simple 1D ice flow simulation model used for testing and benchmarking data assimilation workflows. +- **Location**: `flowline_model/` +- **Status**: Integration underway + +### Icepack Model +PDE-based ice sheet modeling using the Firedrake finite element framework. +- **Location**: `icepack_model/` +- **Status**: Fully supported +- **Dependencies**: Firedrake + +### ISSM (Ice Sheet System Model) +Finite-element ice sheet modeling using MATLAB interface. +- **Location**: `issm_model/` +- **Status**: Development underway +- **Dependencies**: ISSM, MATLAB + +### Lorenz96 Model +Idealized nonlinear dynamical system for data assimilation benchmarking and testing. +- **Location**: `lorenz_model/` +- **Status**: Fully supported + +## Model Registration + +Models are registered in `supported_models.py`, which provides a centralized interface for model discovery and initialization within the ICESEE framework. + +## Structure + +Each model directory typically contains: +- Example configurations and run scripts +- Model-specific utilities and helper functions +- Integration code for connecting to the ICESEE EnKF framework diff --git a/applications/flowline_model/README.md b/applications/flowline_model/README.md new file mode 100644 index 0000000..ca6e354 --- /dev/null +++ b/applications/flowline_model/README.md @@ -0,0 +1,45 @@ +# Flowline Model + +A simple 1D ice flow simulation model for testing and benchmarking ICESEE data assimilation workflows. + +## Overview + +The flowline model provides a lightweight ice sheet representation that is computationally efficient for rapid prototyping and testing of data assimilation algorithms. It simulates ice flow along a single flowline using simplified physics. + +## Files + +- `flowline_enkf.py` - Main EnKF implementation for the flowline model +- `config_loader.py` - Configuration file loader +- `params.yaml` - Model parameters and DA configuration +- `EnKF.ipynb` - Jupyter notebook demonstrating EnKF usage +- `run_flowline_enkf.ipynb` - Complete workflow notebook + +## Usage + +### Running in Jupyter + +Open `run_flowline_enkf.ipynb` to see a complete example of: +1. Model initialization +2. Ensemble generation +3. Data assimilation cycles +4. Results visualization + +### Command Line + +```bash +python flowline_enkf.py --config params.yaml +``` + +## Status + +Integration with ICESEE framework is currently underway. The model is functional for standalone testing and demonstration purposes. + +## Configuration + +Model parameters are specified in `params.yaml`, including: +- Ensemble size +- Observation frequency +- Model physics parameters +- Data assimilation settings + +Refer to the configuration file for detailed parameter descriptions. diff --git a/applications/icepack_model/README.md b/applications/icepack_model/README.md new file mode 100644 index 0000000..0ed7089 --- /dev/null +++ b/applications/icepack_model/README.md @@ -0,0 +1,45 @@ +# Icepack Model + +Integration of the Icepack ice sheet model with the ICESEE data assimilation framework. + +## Overview + +Icepack is a Python library for modeling ice sheets using finite element methods via the Firedrake framework. This directory provides the interface between Icepack and ICESEE's Ensemble Kalman Filter implementation. + +## Structure + +- `examples/` - Example configurations and test cases for different ice geometries and scenarios +- `icepack_utils/` - Utilities specific to Icepack model integration + +## Dependencies + +- Firedrake finite element library +- Icepack ice sheet modeling library +- ICESEE core framework + +## Examples + +The `examples/` directory contains various test cases including: +- Idealized geometries +- Synthetic ice streams +- Realistic glacier scenarios + +Each example typically includes: +- Configuration files +- Mesh generation scripts +- Initial condition setup +- Observation generation + +## Status + +Fully supported and actively used for ice sheet state and parameter estimation. + +## Getting Started + +Refer to the examples in the `examples/` directory for complete workflows. Each example contains its own README with specific instructions. + +## Resources + +- [Icepack Documentation](https://icepack.github.io/) +- [Firedrake Documentation](https://www.firedrakeproject.org/) +- [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki) diff --git a/applications/icepack_model/examples/README.md b/applications/icepack_model/examples/README.md new file mode 100644 index 0000000..0f75eaa --- /dev/null +++ b/applications/icepack_model/examples/README.md @@ -0,0 +1,40 @@ +# Icepack Examples + +This directory contains example configurations and test cases for the Icepack model integration with ICESEE. + +## Available Examples + +Each subdirectory contains a complete example workflow with: +- Model configuration +- Mesh/geometry setup +- Initial conditions +- Observation specifications +- Data assimilation configuration + +## Example Types + +Examples range from idealized test cases to realistic glacier scenarios, providing templates for various data assimilation applications with ice sheet models. + +## Running Examples + +Each example directory typically contains: +1. Configuration files (YAML) +2. Python scripts or Jupyter notebooks +3. Documentation (README.md) +4. Results/output directories + +Refer to individual example READMEs for specific instructions. + +## Requirements + +- Firedrake and Icepack installed +- ICESEE framework +- Sufficient computational resources (varies by example) + +## Contributing + +When adding new examples: +- Include a README describing the scenario +- Provide clear configuration files +- Document expected results +- Keep examples self-contained diff --git a/applications/icepack_model/icepack_utils/README.md b/applications/icepack_model/icepack_utils/README.md new file mode 100644 index 0000000..4a343ee --- /dev/null +++ b/applications/icepack_model/icepack_utils/README.md @@ -0,0 +1,35 @@ +# Icepack Utilities + +This directory contains utilities specific to the Icepack model integration with ICESEE. + +## Purpose + +These utilities provide helper functions and tools for: +- Model initialization and setup +- Data conversion between Icepack and ICESEE formats +- Mesh handling and geometry operations +- Firedrake-specific operations +- Result post-processing + +## Structure + +Utilities are organized to support: +- Model preprocessing +- Runtime operations +- Post-processing and analysis + +## Usage + +Import these utilities in your Icepack-ICESEE workflows: + +```python +from applications.icepack_model.icepack_utils import ... +``` + +## Dependencies + +- Firedrake +- Icepack +- ICESEE core framework + +Refer to individual utility modules for specific dependencies and usage examples. diff --git a/applications/issm_model/README.md b/applications/issm_model/README.md new file mode 100644 index 0000000..c60bf72 --- /dev/null +++ b/applications/issm_model/README.md @@ -0,0 +1,47 @@ +# ISSM Model + +Integration of the Ice Sheet System Model (ISSM) with the ICESEE data assimilation framework. + +## Overview + +ISSM is a comprehensive finite-element ice sheet model that includes a wide range of physics and capabilities. This directory provides the interface between ISSM and ICESEE's ensemble-based data assimilation methods. + +## Structure + +- `examples/` - Example configurations and test cases using ISSM +- `issm_utils/` - Utilities for ISSM model integration and MATLAB-Python interfacing + +## Dependencies + +- ISSM (Ice Sheet System Model) +- MATLAB with ISSM installed +- ICESEE core framework +- Python-MATLAB bridge + +## Status + +Development underway. The integration is being actively developed and tested. + +## Examples + +The `examples/` directory contains test cases including: +- ISMIP benchmark experiments +- Realistic ice sheet configurations +- Parameter estimation scenarios + +## MATLAB Interface + +Since ISSM primarily operates through MATLAB, special utilities are provided in `issm_utils/` to facilitate: +- MATLAB-Python communication +- Data conversion between MATLAB and Python formats +- Process management for MATLAB instances + +## Getting Started + +Refer to the examples in the `examples/` directory. Note that ISSM must be properly installed and configured in your MATLAB environment. + +## Resources + +- [ISSM Website](https://issm.jpl.nasa.gov/) +- [ISSM Documentation](https://issm.jpl.nasa.gov/documentation/) +- [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki) diff --git a/applications/issm_model/examples/README.md b/applications/issm_model/examples/README.md new file mode 100644 index 0000000..00fd993 --- /dev/null +++ b/applications/issm_model/examples/README.md @@ -0,0 +1,40 @@ +# ISSM Examples + +This directory contains example configurations and test cases for the ISSM model integration with ICESEE. + +## Available Examples + +Examples demonstrate various ISSM capabilities with data assimilation, including: +- Benchmark experiments (ISMIP) +- Realistic ice sheet scenarios +- State and parameter estimation workflows + +## Example Structure + +Each example typically includes: +- MATLAB scripts for ISSM model setup +- Python scripts for ICESEE integration +- Configuration files +- Documentation + +## Requirements + +- ISSM installed and configured in MATLAB +- MATLAB with Python interface +- ICESEE framework + +## Running Examples + +Since ISSM operates primarily through MATLAB: +1. Ensure ISSM is properly installed in your MATLAB environment +2. Configure Python-MATLAB interface +3. Follow example-specific instructions in individual READMEs + +## Notes + +The ISSM integration is under active development. Some examples may require specific ISSM versions or configurations. + +## Resources + +- [ISSM Documentation](https://issm.jpl.nasa.gov/documentation/) +- [ISMIP Protocols](http://www.climate-cryosphere.org/wiki/index.php?title=ISMIP6) diff --git a/applications/issm_model/issm_utils/README.md b/applications/issm_model/issm_utils/README.md new file mode 100644 index 0000000..39fb5bf --- /dev/null +++ b/applications/issm_model/issm_utils/README.md @@ -0,0 +1,40 @@ +# ISSM Utilities + +This directory contains utilities for ISSM model integration with ICESEE. + +## Purpose + +These utilities facilitate: +- MATLAB-Python communication +- ISSM data conversion to ICESEE formats +- Process management for MATLAB instances +- Model state and parameter handling +- Results extraction and post-processing + +## Key Components + +### matlab2python/ +Tools for converting MATLAB data structures to Python formats and vice versa. + +### containers/ +Container configurations for running ISSM with ICESEE in containerized environments. + +## Usage + +These utilities bridge the gap between ISSM's MATLAB environment and ICESEE's Python framework, enabling seamless data exchange during data assimilation cycles. + +## MATLAB Interface + +Special consideration is given to: +- Efficient data transfer between MATLAB and Python +- Proper MATLAB process lifecycle management +- Error handling across the MATLAB-Python boundary + +## Dependencies + +- MATLAB +- ISSM (properly configured in MATLAB) +- Python MATLAB Engine API +- ICESEE core framework + +Refer to individual utility modules for specific usage patterns and requirements. diff --git a/applications/lorenz_model/README.md b/applications/lorenz_model/README.md new file mode 100644 index 0000000..1f8439f --- /dev/null +++ b/applications/lorenz_model/README.md @@ -0,0 +1,60 @@ +# Lorenz Model + +Implementation of the Lorenz96 model for data assimilation testing and benchmarking. + +## Overview + +The Lorenz96 model is a simplified dynamical system originally designed to study atmospheric predictability. It serves as an excellent testbed for data assimilation algorithms due to its chaotic behavior and computational efficiency. + +## Structure + +- `examples/` - Example configurations and test cases +- `lorenz_utils/` - Utilities specific to Lorenz96 model integration + +## Model Description + +The Lorenz96 model is a system of ordinary differential equations with the form: + +``` +dX_i/dt = (X_{i+1} - X_{i-2}) * X_{i-1} - X_i + F +``` + +where: +- `X_i` represents the state variable at location i +- `F` is a forcing parameter (typically 8) +- Cyclic boundary conditions are applied + +## Use Cases + +This model is ideal for: +- Testing new data assimilation algorithms +- Benchmarking EnKF performance +- Understanding nonlinear DA behavior +- Educational demonstrations +- Algorithm parameter tuning + +## Status + +Fully supported and extensively tested. + +## Examples + +The `examples/` directory contains various scenarios including: +- Different system sizes (number of variables) +- Various observation configurations +- Perfect and imperfect model experiments +- Parameter estimation tests + +## Getting Started + +See the examples in `examples/lorenz96/` for complete workflows demonstrating: +- Model initialization +- Ensemble generation +- Observation simulation +- EnKF application +- Results analysis + +## Resources + +- Original Lorenz96 paper: Lorenz, E. N. (1996). "Predictability: A problem partly solved" +- [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki) diff --git a/applications/lorenz_model/examples/README.md b/applications/lorenz_model/examples/README.md new file mode 100644 index 0000000..2d9893b --- /dev/null +++ b/applications/lorenz_model/examples/README.md @@ -0,0 +1,48 @@ +# Lorenz Model Examples + +This directory contains example configurations and test cases for the Lorenz96 model with ICESEE. + +## Overview + +These examples demonstrate various data assimilation scenarios using the Lorenz96 model, from basic EnKF applications to advanced parameter estimation. + +## Example Structure + +Each example typically includes: +- Configuration files (YAML) +- Python scripts or Jupyter notebooks +- Expected results and validation data +- Documentation + +## Types of Examples + +Examples cover: +- **Basic State Estimation**: Standard EnKF with perfect model +- **Parameter Estimation**: Joint state-parameter estimation +- **Imperfect Model**: Testing robustness to model error +- **Observation Strategies**: Different observation networks +- **Ensemble Sizes**: Performance with varying ensemble sizes +- **Localization**: Covariance localization techniques + +## Running Examples + +Lorenz96 examples are computationally lightweight and can typically run on a laptop: + +```bash +cd lorenz96/ +python run_example.py --config params.yaml +``` + +Or use the provided Jupyter notebooks for interactive exploration. + +## Educational Use + +These examples are excellent for: +- Learning data assimilation concepts +- Teaching EnKF principles +- Algorithm development and testing +- Benchmarking new methods + +## Quick Start + +Start with the basic example in `lorenz96/` to understand the fundamental workflow, then explore more advanced scenarios. diff --git a/applications/lorenz_model/lorenz_utils/README.md b/applications/lorenz_model/lorenz_utils/README.md new file mode 100644 index 0000000..91995fb --- /dev/null +++ b/applications/lorenz_model/lorenz_utils/README.md @@ -0,0 +1,42 @@ +# Lorenz Model Utilities + +This directory contains utilities specific to the Lorenz96 model integration with ICESEE. + +## Purpose + +These utilities provide: +- Lorenz96 model implementation and integration +- Initial condition generation +- Observation operator implementation +- Model error simulation +- Truth run generation +- Result analysis and visualization tools + +## Model Implementation + +Core functions for the Lorenz96 dynamical system, including: +- Time integration schemes +- Forcing parameter handling +- Cyclic boundary conditions + +## Data Assimilation Support + +Tools for: +- Ensemble initialization +- Observation simulation from truth runs +- Error covariance specification +- Diagnostic computations + +## Usage + +Import these utilities in your Lorenz96-ICESEE workflows: + +```python +from applications.lorenz_model.lorenz_utils import ... +``` + +## Performance + +The Lorenz96 model is computationally efficient, making it ideal for rapid prototyping and testing. Utilities are optimized for both single-threaded and parallel execution. + +Refer to individual utility modules for detailed documentation and examples. diff --git a/config/spack_env/README.md b/config/spack_env/README.md new file mode 100644 index 0000000..d3bc38e --- /dev/null +++ b/config/spack_env/README.md @@ -0,0 +1,83 @@ +# Spack Environment Configurations + +This directory contains Spack environment specifications for managing ICESEE dependencies. + +## Purpose + +Spack is a package manager designed for HPC systems that helps manage complex software dependencies. These environment files define reproducible software environments for ICESEE. + +## Structure + +Each subdirectory contains a Spack environment definition for specific dependency sets: +- `h5py-env/` - Environment for HDF5 and h5py with parallel support + +## Using Spack Environments + +### Install Spack +```bash +git clone https://github.com/spack/spack.git +source spack/share/spack/setup-env.sh +``` + +### Activate an Environment +```bash +cd config/spack_env/h5py-env +spack env activate . +``` + +### Install Dependencies +```bash +spack install +``` + +### Deactivate Environment +```bash +spack env deactivate +``` + +## Benefits + +Using Spack environments provides: +- **Reproducibility**: Exact versions and configurations +- **Portability**: Works across different HPC systems +- **Customization**: Optimize builds for specific hardware +- **Dependency Management**: Automatic handling of complex dependencies + +## Modifying Environments + +To modify an environment: +1. Edit `spack.yaml` in the environment directory +2. Update package versions or add new packages +3. Reinstall: `spack install` + +## Creating New Environments + +For new dependency sets: +```bash +cd config/spack_env +spack env create new-env +cd new-env +# Edit spack.yaml +spack install +``` + +## System-Specific Notes + +Some HPC systems have Spack already installed. Check with: +```bash +which spack +``` + +If available, use the system Spack instead of installing your own. + +## Requirements + +- Spack package manager +- C/C++/Fortran compilers +- Sufficient disk space for builds + +## Resources + +- [Spack Documentation](https://spack.readthedocs.io/) +- [Spack Environments Guide](https://spack.readthedocs.io/en/latest/environments.html) +- [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki) diff --git a/config/spack_env/h5py-env/README.md b/config/spack_env/h5py-env/README.md new file mode 100644 index 0000000..4cb7edc --- /dev/null +++ b/config/spack_env/h5py-env/README.md @@ -0,0 +1,120 @@ +# H5py Spack Environment + +Spack environment for HDF5 and h5py with parallel I/O support. + +## Purpose + +This environment provides HDF5 with MPI support and the h5py Python interface, which are critical for ICESEE's parallel I/O operations. + +## Contents + +The `spack.yaml` file specifies: +- HDF5 with MPI support +- h5py Python package +- Compatible MPI implementation +- Required dependencies + +## Features + +### Parallel HDF5 +- MPI-enabled for parallel I/O +- Optimized for HPC filesystems +- Thread-safe configuration + +### h5py +- Python interface to HDF5 +- Parallel I/O capabilities +- NumPy integration + +## Installation + +### Activate and Install +```bash +cd config/spack_env/h5py-env +spack env activate . +spack install +``` + +### Verify Installation +```bash +python -c "import h5py; print(h5py.version.info)" +python -c "import h5py; print('Parallel:', h5py.get_config().mpi)" +``` + +## Usage + +After installation, the environment provides: +- HDF5 libraries with MPI support +- h5py Python module +- Proper library paths and environment variables + +### In Python +```python +import h5py + +# Create parallel HDF5 file +with h5py.File('data.h5', 'w', driver='mpio', comm=MPI.COMM_WORLD) as f: + dset = f.create_dataset('data', shape=(1000, 1000), dtype='f') + dset[rank*chunk:(rank+1)*chunk, :] = local_data +``` + +## Configuration Options + +The Spack spec may include: +- `+mpi`: Enable MPI support (required) +- `+fortran`: Fortran interface (optional) +- `+hl`: High-level API (recommended) + +## Troubleshooting + +### MPI Not Found +Ensure MPI is available: +```bash +spack find mpi +``` + +If not installed: +```bash +spack install mpi +``` + +### h5py Import Error +Check Python can find h5py: +```bash +python -c "import h5py" +``` + +If it fails, ensure the environment is activated: +```bash +spack env activate . +``` + +## Performance Tips + +For optimal parallel I/O: +- Use collective operations when possible +- Align chunk sizes with filesystem blocks +- Tune MPI-IO hints +- Use parallel HDF5 filters + +## System-Specific Notes + +Some HPC systems have optimized HDF5 builds. Consider using system modules: +```bash +module avail hdf5 +``` + +If suitable modules exist, you may not need this Spack environment. + +## Requirements + +- Spack +- MPI implementation +- C compiler +- Python development headers + +## References + +- [HDF5 Documentation](https://www.hdfgroup.org/solutions/hdf5/) +- [h5py Documentation](https://docs.h5py.org/) +- [Parallel HDF5 Guide](https://docs.h5py.org/en/stable/mpi.html) diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 0000000..794b7a0 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,31 @@ +# Scripts + +This directory contains utility scripts for data management, visualization, HPC job submission, and MATLAB process management. + +## Subdirectories + +### data_management/ +Scripts for handling ICESEE data files: +- `get_data.py` - Data retrieval utilities +- `inspect_files.py` - File inspection tools +- `post_processing.py` - Post-processing workflows +- `stack_icesee_data.py` - Data stacking and aggregation + +### matlab/ +MATLAB-related utilities: +- `kill_matlab_processes.py` - Clean up MATLAB processes + +### plotting/ +Visualization and plotting utilities: +- `scaling_plots.py` - Performance scaling visualizations +- `scaling_plots_csv_details.py` - Detailed CSV-based scaling plots + +### slurm/ +SLURM job submission scripts for HPC environments: +- `run_da_issm.py` - Run ISSM data assimilation jobs +- `run_job*.sbatch` - Various SLURM batch scripts +- `run_model.m` - MATLAB model execution script + +## Usage + +These scripts are typically invoked from the command line or imported as utilities in other parts of the ICESEE framework. Refer to individual script documentation for specific usage patterns. diff --git a/scripts/data_management/README.md b/scripts/data_management/README.md new file mode 100644 index 0000000..f82b01b --- /dev/null +++ b/scripts/data_management/README.md @@ -0,0 +1,69 @@ +# Data Management Scripts + +This directory contains scripts for managing ICESEE data files, including retrieval, inspection, post-processing, and aggregation. + +## Scripts + +### get_data.py +Utilities for retrieving and loading ICESEE data from various storage locations. + +**Usage**: +```python +from scripts.data_management.get_data import ... +``` + +### inspect_files.py +Tools for inspecting ICESEE data files, including metadata examination and data structure validation. + +**Features**: +- File format verification +- Metadata display +- Data structure inspection + +### post_processing.py +Post-processing workflows for ICESEE output data. + +**Capabilities**: +- Data cleaning and validation +- Statistical computations +- Output formatting +- Result extraction + +### stack_icesee_data.py +Tools for stacking and aggregating ICESEE data across multiple runs or ensemble members. + +**Use Cases**: +- Combining ensemble output +- Multi-run aggregation +- Time series concatenation +- Spatial data stacking + +## File Formats + +ICESEE primarily uses: +- HDF5 (.h5) for efficient array storage +- Zarr for cloud-optimized storage +- NetCDF for compatibility with geoscience tools + +## Usage Examples + +```python +# Load data +from scripts.data_management import get_data +data = get_data.load_ensemble_data('path/to/file.h5') + +# Inspect file +from scripts.data_management import inspect_files +inspect_files.show_structure('path/to/file.h5') + +# Stack data +from scripts.data_management import stack_icesee_data +stacked = stack_icesee_data.stack_runs(['run1.h5', 'run2.h5']) +``` + +## Dependencies + +- h5py (HDF5 support) +- zarr (Zarr support) +- numpy +- ICESEE core utilities diff --git a/scripts/matlab/README.md b/scripts/matlab/README.md new file mode 100644 index 0000000..9d77b6c --- /dev/null +++ b/scripts/matlab/README.md @@ -0,0 +1,42 @@ +# MATLAB Scripts + +This directory contains scripts for managing MATLAB processes and interfacing with MATLAB-based models like ISSM. + +## Scripts + +### kill_matlab_processes.py +Utility for safely terminating MATLAB processes that may be left running after data assimilation runs. + +**Purpose**: +- Clean up orphaned MATLAB processes +- Prevent resource leaks +- Manage MATLAB instance lifecycle + +**Usage**: +```bash +python kill_matlab_processes.py +``` + +**Features**: +- Identifies MATLAB processes +- Safe termination procedures +- Process filtering options + +## Background + +When running MATLAB-based models (like ISSM) with ICESEE, MATLAB processes may occasionally persist after the main workflow completes. These scripts ensure proper cleanup and resource management. + +## Safety + +These utilities are designed to safely terminate MATLAB processes without corrupting data or affecting other applications. Always ensure important work is saved before running cleanup scripts. + +## Related Components + +For MATLAB-Python integration, see: +- `applications/issm_model/issm_utils/` - MATLAB interface utilities +- MATLAB Engine API documentation + +## Dependencies + +- psutil (process management) +- Python 3.x diff --git a/scripts/plotting/README.md b/scripts/plotting/README.md new file mode 100644 index 0000000..2d7c58d --- /dev/null +++ b/scripts/plotting/README.md @@ -0,0 +1,67 @@ +# Plotting Scripts + +This directory contains visualization and plotting utilities for ICESEE results, with a focus on performance analysis and scaling studies. + +## Scripts + +### scaling_plots.py +Generate performance scaling plots for ICESEE parallel execution. + +**Features**: +- Strong scaling analysis +- Weak scaling analysis +- Speedup calculations +- Efficiency plots +- Comparison across configurations + +**Usage**: +```bash +python scaling_plots.py --data +``` + +### scaling_plots_csv_details.py +Detailed scaling analysis using CSV data files. + +**Features**: +- Read performance data from CSV +- Multiple metric visualization +- Detailed performance breakdowns +- Custom plot configurations +- Export publication-quality figures + +**Usage**: +```bash +python scaling_plots_csv_details.py --csv --output +``` + +## Data Format + +Scaling scripts expect performance data in specific formats: +- Execution times +- Number of processors +- Problem sizes +- Memory usage +- Communication overhead + +## Output + +Plots are generated in common formats: +- PNG for quick viewing +- PDF for publications +- SVG for editing + +## Use Cases + +These scripts are particularly useful for: +- HPC performance analysis +- Parallel efficiency studies +- Configuration optimization +- Publication figures +- Technical reports + +## Dependencies + +- matplotlib (plotting) +- numpy (data processing) +- pandas (CSV handling) +- seaborn (enhanced visualizations, optional) diff --git a/scripts/slurm/README.md b/scripts/slurm/README.md new file mode 100644 index 0000000..44f8424 --- /dev/null +++ b/scripts/slurm/README.md @@ -0,0 +1,76 @@ +# SLURM Scripts + +This directory contains scripts and batch files for running ICESEE workflows on HPC systems using the SLURM workload manager. + +## Batch Scripts + +### run_job.sbatch +Basic SLURM job submission script for ICESEE data assimilation runs. + +**Usage**: +```bash +sbatch run_job.sbatch +``` + +### run_job_in.sbatch +Interactive job submission with custom parameters. + +### run_job_strong.sbatch +Configuration for strong scaling studies (fixed problem size, increasing processors). + +### run_job_weak.sbatch +Configuration for weak scaling studies (problem size scales with processors). + +## Python Scripts + +### run_da_issm.py +Python script for orchestrating ISSM data assimilation runs on SLURM clusters. + +**Features**: +- Job submission automation +- Parameter sweeps +- Dependency management +- Output organization + +**Usage**: +```bash +python run_da_issm.py --config +``` + +## MATLAB Scripts + +### run_model.m +MATLAB script for executing ISSM model runs within SLURM jobs. + +## Customization + +To adapt these scripts for your HPC system: +1. Update partition/queue names +2. Adjust time limits +3. Modify memory requirements +4. Configure environment modules +5. Set correct paths + +## Typical Workflow + +1. Prepare configuration files +2. Customize batch script for your system +3. Submit job: `sbatch run_job.sbatch` +4. Monitor: `squeue -u $USER` +5. Check results in output directory + +## Environment + +These scripts typically require: +- SLURM workload manager +- Environment modules for ICESEE dependencies +- Proper MPI configuration +- Adequate scratch space + +## Resources + +For cluster-specific information, consult your HPC center's documentation on: +- SLURM configuration +- Queue policies +- File systems +- Module environment diff --git a/src/EnKF/README.md b/src/EnKF/README.md new file mode 100644 index 0000000..5c14c64 --- /dev/null +++ b/src/EnKF/README.md @@ -0,0 +1,85 @@ +# Ensemble Kalman Filter (EnKF) + +This directory contains the core Ensemble Kalman Filter implementations for ICESEE. + +## Implementations + +### python_enkf/ +Pure Python implementation of the EnKF algorithm. + +**Files**: +- `EnKF.py` - Main EnKF implementation with various algorithms +- `enkf_class_python.py` - Object-oriented EnKF class interface + +**Features**: +- Standard EnKF +- Ensemble Transform Kalman Filter (ETKF) +- Ensemble Square Root Filter (EnSRF) +- Localization support +- Inflation techniques + +**Advantages**: +- Easy to understand and modify +- Pure Python (no compilation needed) +- Good for prototyping + +### cython_enkf/ +Optimized Cython implementation for performance-critical applications. + +**Files**: +- `enkf.pyx` - Cython-optimized EnKF implementation +- `setup.py` - Build configuration + +**Features**: +- Highly optimized matrix operations +- C-level performance +- Reduced memory footprint +- Same algorithms as Python version + +**Advantages**: +- Faster execution +- Better scaling for large ensembles +- Efficient for production runs + +## Usage + +### Python EnKF +```python +from src.EnKF.python_enkf import EnKF + +enkf = EnKF(ensemble_size=50) +analysis = enkf.update(forecast, observations, obs_operator) +``` + +### Cython EnKF +First compile: +```bash +cd src/EnKF/cython_enkf +python setup.py build_ext --inplace +``` + +Then use similarly to Python version. + +## Algorithm Variants + +Both implementations support: +- **Deterministic EnKF**: Uses ensemble mean for analysis +- **Stochastic EnKF**: Perturbed observations +- **ETKF**: Transform-based, deterministic +- **EnSRF**: Square root filter, deterministic + +## Localization + +Covariance localization is supported through: +- Gaspari-Cohn correlation function +- Distance-based cutoff +- Custom localization matrices + +## Performance + +For small to medium problems (ensemble size < 100), Python implementation is sufficient. For larger problems or production runs, use the Cython version. + +## References + +- Evensen, G. (2003). "The Ensemble Kalman Filter: theoretical formulation and practical implementation" +- Hunt et al. (2007). "Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter" diff --git a/src/EnKF/cython_enkf/README.md b/src/EnKF/cython_enkf/README.md new file mode 100644 index 0000000..e0a70d9 --- /dev/null +++ b/src/EnKF/cython_enkf/README.md @@ -0,0 +1,108 @@ +# Cython EnKF + +High-performance Cython implementation of the Ensemble Kalman Filter for ICESEE. + +## Files + +### enkf.pyx +Cython source code with optimized EnKF implementations. + +**Features**: +- C-optimized matrix operations +- Memory-efficient algorithms +- Parallel operation support +- Same functionality as Python version + +### setup.py +Build configuration for compiling the Cython extension. + +**Dependencies**: +- Cython +- NumPy +- C compiler (gcc, clang, MSVC) + +## Building + +### Compile the Extension +```bash +cd src/EnKF/cython_enkf +python setup.py build_ext --inplace +``` + +This creates a compiled module (.so on Linux/Mac, .pyd on Windows) that can be imported like a regular Python module. + +### Development Build +For development with debugging symbols: +```bash +python setup.py build_ext --inplace --debug +``` + +## Usage + +After compilation, use like the Python version: + +```python +from src.EnKF.cython_enkf import enkf_update + +X_analysis = enkf_update( + X_forecast, + y_obs, + H, + R, + localization=None +) +``` + +## Performance + +### Speed Improvements +Typical speedups compared to Python version: +- 5-10x for medium ensembles (50-100 members) +- 10-20x for large ensembles (>100 members) +- Better scaling with problem size + +### Memory Efficiency +- Reduced memory allocations +- In-place operations where possible +- Efficient array handling + +## When to Use + +Use the Cython EnKF when: +- Running production data assimilation +- Working with large ensembles (>100 members) +- Performance is critical +- Running many DA cycles + +Use the Python EnKF when: +- Prototyping new algorithms +- Debugging +- Working with small problems +- Ease of modification is important + +## Compilation Requirements + +- Python development headers +- Cython (pip install cython) +- NumPy development headers +- C compiler toolchain + +## Troubleshooting + +If compilation fails: +1. Ensure Cython is installed: `pip install cython` +2. Check C compiler availability: `gcc --version` +3. Verify NumPy installation: `python -c "import numpy; print(numpy.get_include())"` +4. Check the `out` file for compilation logs + +## Testing + +Performance benchmarks and correctness tests are in `src/tests/`. + +## Maintenance + +When updating the algorithm: +1. Modify enkf.pyx +2. Rebuild: `python setup.py build_ext --inplace` +3. Test: Run test suite +4. Update this README if interface changes diff --git a/src/EnKF/python_enkf/README.md b/src/EnKF/python_enkf/README.md new file mode 100644 index 0000000..ed40109 --- /dev/null +++ b/src/EnKF/python_enkf/README.md @@ -0,0 +1,85 @@ +# Python EnKF + +Pure Python implementation of the Ensemble Kalman Filter for ICESEE. + +## Files + +### EnKF.py +Main module containing EnKF algorithms and supporting functions. + +**Key Functions**: +- Ensemble analysis updates +- Covariance calculations +- Localization functions +- Inflation algorithms +- Observation operator handling + +### enkf_class_python.py +Object-oriented interface for the EnKF. + +**Features**: +- Clean class-based API +- State management +- Configuration handling +- History tracking + +## Usage + +### Functional Interface +```python +from src.EnKF.python_enkf.EnKF import enkf_update + +X_analysis = enkf_update( + X_forecast, + y_obs, + H, + R, + localization=None +) +``` + +### Class Interface +```python +from src.EnKF.python_enkf.enkf_class_python import EnKFPython + +enkf = EnKFPython(config) +X_analysis = enkf.analysis_step(X_forecast, y_obs) +``` + +## Algorithm Details + +### Standard EnKF +The ensemble mean and covariance are used to compute the Kalman gain and update each ensemble member. + +### Localization +Covariance localization reduces spurious long-range correlations: +- Gaspari-Cohn function +- Schur product with localization matrix +- Distance-based tapering + +### Inflation +Ensemble inflation counters variance underestimation: +- Multiplicative inflation +- Additive inflation +- Adaptive inflation + +## Advantages + +- **Readable**: Easy to understand algorithm implementation +- **Flexible**: Simple to modify and extend +- **No Compilation**: Works immediately without build step +- **Debugging**: Easier to debug than compiled code + +## Performance Considerations + +- Suitable for small to medium ensemble sizes (< 100 members) +- For larger problems, consider the Cython implementation +- Numpy vectorization provides reasonable performance + +## Testing + +Unit tests for this module are located in `src/tests/`. + +## References + +See the main EnKF README for algorithm references. diff --git a/src/README.md b/src/README.md new file mode 100644 index 0000000..1024f37 --- /dev/null +++ b/src/README.md @@ -0,0 +1,41 @@ +# Source (src) + +This directory contains the core implementation of the ICESEE data assimilation framework. + +## Subdirectories + +### EnKF/ +Ensemble Kalman Filter implementations: +- `python_enkf/` - Pure Python EnKF implementation +- `cython_enkf/` - Optimized Cython EnKF for performance + +### parallelization/ +Parallelization infrastructure for distributed computing: +- MPI-based parallel I/O +- Parallel ensemble initialization +- Parallel forecast and analysis functions + +### run_model_da/ +Main entry points for running data assimilation workflows: +- Serial and parallel execution modes +- Full and partial parallelization strategies +- Localization and error generation functions + +### tests/ +Test suites and example implementations: +- Model-specific test cases +- Parallel MPI tests +- Zarr storage setup tests + +### utils/ +Common utility functions and tools used across the framework. + +## Architecture + +The ICESEE framework follows a modular design: +1. **EnKF Core**: Implements the mathematical operations for ensemble data assimilation +2. **Parallelization Layer**: Handles distributed computing with MPI +3. **Model Interface**: Connects external models to the DA framework +4. **I/O Layer**: Manages efficient data storage and retrieval + +For detailed usage, refer to the [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki). diff --git a/src/parallelization/README.md b/src/parallelization/README.md new file mode 100644 index 0000000..d09a22a --- /dev/null +++ b/src/parallelization/README.md @@ -0,0 +1,117 @@ +# Parallelization + +This directory contains the parallelization infrastructure for distributed data assimilation with ICESEE. + +## Overview + +ICESEE uses MPI (Message Passing Interface) for parallel execution on HPC systems. This directory provides utilities for parallel ensemble operations, I/O, and communication. + +## Structure + +- `parallel_mpi/` - MPI-specific implementations +- Core parallel I/O functions +- MPI communication wrappers +- Parallel analysis and forecast functions + +## Key Modules + +### EnKF_parallel_io.py +High-level parallel I/O operations for ensemble data. + +**Features**: +- Parallel HDF5/Zarr reading and writing +- Distributed ensemble storage +- Checkpoint management +- Restart capabilities + +### _parallel_i_o.py +Low-level parallel I/O primitives. + +### MPI Analysis Functions +- `_mpi_analysis_functions.py` - Parallel EnKF analysis step +- Distributed covariance calculations +- Parallel localization + +### MPI Forecast Functions +- `_mpi_forecast_functions.py` - Parallel ensemble forecasting +- Load balancing across processors +- Task distribution + +### MPI Initialization +- `_mpi_ensemble_intialization.py` - Parallel ensemble generation +- Distributed initial conditions + +### MPI Observations +- `_mpi_generate_synthetic_observations.py` - Parallel observation generation +- `_mpi_generate_true_wrong_state.py` - Truth and background states + +## Parallelization Strategy + +### Domain Decomposition +Ensemble members are distributed across MPI processes: +- Each process handles a subset of ensemble members +- Communication occurs during analysis step +- Efficient for large ensembles + +### Data Parallelism +Model runs are embarrassingly parallel: +- Minimal communication during forecast +- Gather operations for analysis +- Scalable to many processors + +## Usage + +### MPI Execution +```bash +mpirun -np 48 python run_parallel_da.py --config params.yaml +``` + +### Configuration +```yaml +parallel_flag: true +n_modeltasks: 48 +model_nprocs: 1 # processors per model instance +``` + +## Performance + +### Scaling +- **Strong scaling**: Fixed problem size, increasing processors +- **Weak scaling**: Problem size scales with processors +- Near-ideal scaling for forecast step +- Communication overhead in analysis step + +### Optimization Tips +- Use parallel I/O for large datasets +- Balance ensemble size with number of processors +- Consider communication costs +- Use localization to reduce communication + +## Requirements + +- MPI implementation (OpenMPI, MPICH, Intel MPI) +- mpi4py Python package +- Parallel HDF5 (optional but recommended) +- Sufficient network bandwidth for large ensembles + +## Debugging + +### Common Issues +- MPI initialization failures +- Deadlocks in communication +- Load imbalance +- I/O bottlenecks + +### Tools +```bash +# Check MPI setup +mpirun -np 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank)" + +# Profile execution +mpirun -np 48 python -m cProfile -o profile.out run_parallel_da.py +``` + +## References + +- [mpi4py Documentation](https://mpi4py.readthedocs.io/) +- [Parallel HDF5](https://docs.h5py.org/en/stable/mpi.html) diff --git a/src/parallelization/parallel_mpi/README.md b/src/parallelization/parallel_mpi/README.md new file mode 100644 index 0000000..136e312 --- /dev/null +++ b/src/parallelization/parallel_mpi/README.md @@ -0,0 +1,67 @@ +# Parallel MPI + +MPI-specific implementations for parallel data assimilation operations. + +## Purpose + +This directory contains specialized MPI implementations for computationally intensive operations that benefit from parallelization. + +## Contents + +The parallel_mpi directory provides optimized parallel implementations for: +- Ensemble member distribution +- Parallel model execution +- Distributed data operations +- Communication patterns + +## MPI Communication Patterns + +### Scatter/Gather +Used for distributing work and collecting results: +```python +# Distribute ensemble members to processors +local_ensemble = scatter_ensemble(global_ensemble, comm) + +# Gather results after forecast +global_results = gather_results(local_results, comm) +``` + +### All-to-All +Used during analysis step for covariance calculations. + +### Collective Operations +- Reductions (sum, max, min) +- Broadcasts +- Barriers for synchronization + +## Load Balancing + +Strategies for balanced workload distribution: +- **Static**: Pre-determined distribution +- **Dynamic**: Work-stealing for heterogeneous tasks +- **Cyclic**: Round-robin assignment + +## Performance Monitoring + +Track parallel performance: +- Communication time +- Computation time +- Load imbalance metrics +- Scalability indicators + +## Usage + +These modules are typically called from higher-level parallelization functions and are not used directly by users. + +## Requirements + +- MPI-compatible implementation +- mpi4py +- Proper MPI environment setup + +## Best Practices + +- Minimize communication frequency +- Use collective operations when possible +- Overlap communication and computation +- Profile to identify bottlenecks diff --git a/src/run_model_da/README.md b/src/run_model_da/README.md new file mode 100644 index 0000000..329fdc9 --- /dev/null +++ b/src/run_model_da/README.md @@ -0,0 +1,154 @@ +# Run Model Data Assimilation + +This directory contains the main entry points and orchestration logic for running data assimilation workflows with ICESEE. + +## Main Scripts + +### run_models_da.py +Primary entry point for running data assimilation experiments. + +**Features**: +- Configuration loading +- Workflow orchestration +- Model-EnKF integration +- Results management + +**Usage**: +```bash +python run_models_da.py -F params.yaml --Nens 50 +``` + +### icesee_da_serial.py +Serial (single-processor) data assimilation implementation. + +**When to use**: +- Small problems +- Debugging +- Development +- Testing + +### icesee_da_partial_parallel.py +Partially parallelized implementation. + +**Features**: +- Parallel forecast step +- Serial analysis step +- Suitable for medium-scale problems + +### icesee_da_full_parallel.py +Fully parallelized data assimilation implementation. + +**Features**: +- Parallel forecast and analysis +- Distributed I/O +- Maximum scalability +- For production HPC runs + +## Supporting Modules + +### _error_generation.py +Functions for generating model and observation errors. + +**Capabilities**: +- Gaussian random fields +- Spatially correlated errors +- Temporal correlation +- Observation error simulation + +### _localization_functions.py +Covariance localization implementations. + +**Methods**: +- Gaspari-Cohn correlation function +- Distance-based localization +- Custom localization patterns +- Adaptive localization + +## Workflow Structure + +A typical ICESEE workflow: + +1. **Initialization** + - Load configuration + - Initialize model + - Generate initial ensemble + +2. **Forecast Step** + - Run ensemble members forward in time + - (Parallel execution) + +3. **Analysis Step** + - Compute ensemble statistics + - Apply EnKF update + - (Parallel or serial) + +4. **Output** + - Save ensemble state + - Checkpoint data + - Diagnostic output + +5. **Repeat** for next observation time + +## Execution Modes + +### Serial +```bash +python icesee_da_serial.py --config params.yaml +``` + +### Partial Parallel +```bash +mpirun -np 24 python icesee_da_partial_parallel.py --config params.yaml +``` + +### Full Parallel +```bash +mpirun -np 48 python icesee_da_full_parallel.py --config params.yaml +``` + +## Configuration + +Key configuration parameters (see `config/README.md` for full details): +- `Nens`: Ensemble size +- `execution_mode`: 0=serial, 1=partial parallel, 2=full parallel +- `model_name`: Which model to use +- `parallel_flag`: Enable/disable parallelization + +## Error Handling + +The framework includes robust error handling: +- Configuration validation +- Model execution errors +- I/O failures +- MPI communication errors + +## Restart Capability + +Workflows can be restarted from checkpoints: +- Automatic checkpoint creation +- Configurable checkpoint frequency +- Resume from any saved state + +## Logging + +Comprehensive logging for debugging and monitoring: +- Configuration summary +- Timing information +- Performance metrics +- Error messages + +## Testing + +Test these modules with the examples in `applications/` directories. + +## Performance Tips + +- Use full parallel mode for production runs +- Profile to identify bottlenecks +- Balance ensemble size with available processors +- Use localization for large spatial domains +- Enable checkpointing for long runs + +## References + +See the [ICESEE Wiki](https://github.com/ICESEE-project/ICESEE/wiki) for detailed workflow documentation. diff --git a/src/tests/flowline_enkf_py/README.md b/src/tests/flowline_enkf_py/README.md new file mode 100644 index 0000000..191d3de --- /dev/null +++ b/src/tests/flowline_enkf_py/README.md @@ -0,0 +1,74 @@ +# Flowline EnKF Python Tests + +Test suite for the Python flowline model with EnKF data assimilation. + +## Purpose + +This directory contains tests specifically for validating the flowline model integration with the Python EnKF implementation. + +## Test Coverage + +Tests include: +- Flowline model initialization +- Ensemble generation +- Forward model execution +- EnKF analysis step +- Complete DA cycle +- Configuration handling + +## Running Tests + +### All Tests +```bash +cd src/tests/flowline_enkf_py +python -m pytest +``` + +### Specific Test +```bash +python -m pytest test_flowline_enkf.py::test_ensemble_init +``` + +## Test Structure + +Typical test structure: +1. Setup: Initialize model and configuration +2. Execute: Run DA cycle or component +3. Validate: Check results against expected values +4. Cleanup: Reset state for next test + +## Expected Results + +Tests validate: +- Ensemble dimensions are correct +- Analysis reduces ensemble spread +- RMSE improves with assimilation +- Configuration parameters are respected + +## Dependencies + +- pytest +- numpy +- ICESEE core modules +- Flowline model implementation + +## Adding Tests + +When adding new tests: +1. Follow existing naming conventions +2. Use descriptive test names +3. Include docstrings +4. Ensure tests are independent +5. Clean up resources after tests + +## Debugging + +For verbose output: +```bash +python -m pytest -v -s +``` + +For specific test debugging: +```bash +python -m pytest --pdb test_file.py::test_name +``` diff --git a/src/tests/flowline_enkf_py_jl/README.md b/src/tests/flowline_enkf_py_jl/README.md new file mode 100644 index 0000000..6b9ec6b --- /dev/null +++ b/src/tests/flowline_enkf_py_jl/README.md @@ -0,0 +1,79 @@ +# Flowline EnKF Python-Julia Tests + +Test suite for flowline model with Python-Julia interoperability. + +## Purpose + +This directory contains tests for the flowline model that utilize both Python and Julia implementations, testing the interface between the two languages. + +## Overview + +These tests validate: +- Python-Julia data exchange +- Julia model execution from Python +- Performance comparisons +- Numerical consistency + +## Background + +Julia can provide performance benefits for certain numerical operations. These tests ensure that: +- Julia implementations produce correct results +- Data transfers between Python and Julia work correctly +- Performance improvements are realized + +## Requirements + +- Python with PyJulia installed +- Julia runtime +- Required Julia packages +- ICESEE framework + +## Running Tests + +### Setup Julia Environment +```bash +# First time setup +julia setup_julia_env.jl +``` + +### Run Tests +```bash +cd src/tests/flowline_enkf_py_jl +python -m pytest +``` + +## Performance Testing + +Compare Python vs Julia performance: +```bash +python benchmark_python_vs_julia.py +``` + +## Test Categories + +1. **Correctness Tests**: Verify Julia implementation matches Python +2. **Performance Tests**: Measure speedup from Julia +3. **Interface Tests**: Validate data exchange +4. **Integration Tests**: Complete DA workflows + +## Troubleshooting + +### Julia Not Found +Ensure Julia is in PATH: +```bash +julia --version +``` + +### PyJulia Issues +Reinstall PyJulia: +```bash +pip install julia +python -c "import julia; julia.install()" +``` + +### Numerical Differences +Small floating-point differences between Python and Julia are expected. Tests use appropriate tolerances. + +## Status + +This is an experimental feature exploring Julia integration for performance-critical operations. diff --git a/src/tests/parallel_mpi/README.md b/src/tests/parallel_mpi/README.md new file mode 100644 index 0000000..718857e --- /dev/null +++ b/src/tests/parallel_mpi/README.md @@ -0,0 +1,124 @@ +# Parallel MPI Tests + +Test suite for MPI parallelization functionality in ICESEE. + +## Purpose + +Validate parallel execution, communication patterns, and scalability of the MPI implementation. + +## Test Coverage + +Tests include: +- MPI initialization and finalization +- Ensemble distribution across processors +- Parallel I/O operations +- Communication patterns (scatter, gather, broadcast) +- Parallel forecast step +- Parallel analysis step +- Load balancing +- Scaling performance + +## Running Tests + +### Serial Test (for debugging) +```bash +python test_mpi_basic.py +``` + +### Parallel Tests +```bash +# 4 processes +mpirun -np 4 python -m pytest test_parallel_*.py + +# 8 processes +mpirun -np 8 python -m pytest test_parallel_*.py +``` + +## Test Structure + +MPI tests have special requirements: +- Must be run with mpirun/mpiexec +- Each process executes the test +- Assertions must be coordinated +- Cleanup is critical + +## Performance Tests + +### Strong Scaling +```bash +# Run with increasing processor counts +for np in 2 4 8 16; do + mpirun -np $np python test_scaling.py --mode strong +done +``` + +### Weak Scaling +```bash +# Run with problem size scaling with processors +for np in 2 4 8 16; do + mpirun -np $np python test_scaling.py --mode weak --size $((np*100)) +done +``` + +## Common Issues + +### Deadlocks +If tests hang, check for: +- Unmatched send/receive pairs +- Missing barriers +- Incorrect collective operations + +### Load Imbalance +Monitor processor utilization: +```bash +mpirun -np 8 python test_load_balance.py --profile +``` + +## Requirements + +- MPI implementation (OpenMPI, MPICH, Intel MPI) +- mpi4py +- pytest +- ICESEE core modules + +## Expected Results + +- All processors should complete successfully +- Communication overhead should be reasonable +- Near-linear speedup for embarrassingly parallel operations +- Acceptable efficiency for analysis step + +## Debugging MPI Tests + +### Verbose Output +```bash +mpirun -np 4 python -m pytest -v -s +``` + +### Single Process Debug +```bash +# Debug rank 0 only +mpirun -np 4 xterm -e gdb python test_parallel.py +``` + +### MPI Profiling +```bash +mpirun -np 8 --profile python test_performance.py +``` + +## Adding Tests + +When adding MPI tests: +1. Ensure all processes participate +2. Use collective operations correctly +3. Validate results on all ranks +4. Clean up MPI resources +5. Handle edge cases (1 processor, many processors) + +## Best Practices + +- Test with different processor counts +- Verify correctness before performance +- Use timeouts to catch deadlocks +- Profile to identify bottlenecks +- Test both small and large problems diff --git a/src/tests/zarr_setup/README.md b/src/tests/zarr_setup/README.md new file mode 100644 index 0000000..3d2a944 --- /dev/null +++ b/src/tests/zarr_setup/README.md @@ -0,0 +1,143 @@ +# Zarr Setup Tests + +Test suite for Zarr storage configuration and functionality in ICESEE. + +## Purpose + +Validate Zarr-based storage for efficient, cloud-optimized ensemble data management. + +## Overview + +Zarr provides: +- Chunked array storage +- Compression options +- Cloud storage compatibility +- Parallel I/O support +- Efficient partial reads/writes + +These tests ensure ICESEE's Zarr integration works correctly. + +## Test Coverage + +Tests include: +- Zarr array creation +- Data writing and reading +- Chunk size optimization +- Compression settings +- Metadata handling +- Parallel I/O with Zarr +- Cloud storage integration (if configured) + +## Running Tests + +### Local Tests +```bash +cd src/tests/zarr_setup +python -m pytest +``` + +### With Specific Storage Backend +```bash +# Test with local filesystem +python -m pytest --storage local + +# Test with S3 (if configured) +python -m pytest --storage s3 +``` + +## Configuration + +Zarr settings can be configured in test fixtures: +```python +zarr_config = { + 'chunks': (10, 100, 100), + 'compressor': 'blosc', + 'compression_level': 5, +} +``` + +## Performance Tests + +### Chunk Size Optimization +```bash +python benchmark_chunk_sizes.py +``` + +### Compression Comparison +```bash +python test_compression_methods.py --output results.csv +``` + +## Storage Backends + +Tests support multiple backends: +- **Local filesystem**: Default for development +- **S3**: AWS S3 or compatible (MinIO, etc.) +- **Network filesystems**: Lustre, GPFS + +## Common Operations + +### Creating Zarr Store +```python +import zarr +store = zarr.DirectoryStore('data.zarr') +root = zarr.group(store=store) +ensemble = root.create_dataset('ensemble', shape=(50, 1000), chunks=(5, 100)) +``` + +### Reading Data +```python +ensemble = zarr.open('data.zarr/ensemble', mode='r') +member_0 = ensemble[0, :] +``` + +## Best Practices + +- Choose chunk sizes based on access patterns +- Use compression for I/O-bound workloads +- Test with realistic data sizes +- Profile I/O performance +- Consider cloud storage for large datasets + +## Troubleshooting + +### Slow I/O +- Check chunk sizes +- Verify compression settings +- Monitor network/disk bandwidth + +### Compatibility Issues +- Ensure Zarr version compatibility +- Check storage backend support + +## Requirements + +- zarr (Python package) +- numcodecs (compression) +- fsspec (filesystem interfaces) +- Optional: s3fs (for S3 support) + +## Expected Results + +Tests validate: +- Data integrity after write/read cycles +- Correct handling of array shapes +- Proper compression/decompression +- Efficient chunked access +- Metadata preservation + +## Adding Tests + +When adding Zarr tests: +1. Test both read and write operations +2. Verify data integrity +3. Include performance benchmarks +4. Test edge cases (empty arrays, large arrays) +5. Clean up test data files + +## Future Enhancements + +- Distributed Zarr with Dask +- Cloud storage optimization +- Advanced compression codecs +- Automatic chunk size selection diff --git a/src/utils/README.md b/src/utils/README.md new file mode 100644 index 0000000..a450c45 --- /dev/null +++ b/src/utils/README.md @@ -0,0 +1,111 @@ +# Utilities + +Common utility functions and tools used throughout the ICESEE framework. + +## Modules + +### tools.py +General-purpose tools and helper functions. + +**Categories**: +- File I/O utilities +- Data conversion functions +- Configuration helpers +- Path management +- Error handling utilities + +**Common Functions**: +- File existence checks +- Directory creation +- Data type conversions +- Configuration parsing +- Logging setup + +### utils.py +Core utility functions for data assimilation workflows. + +**Categories**: +- Ensemble manipulation +- Statistical computations +- Array operations +- Diagnostic functions +- Validation utilities + +**Common Functions**: +- Ensemble mean and spread calculations +- RMSE and other metrics +- Data validation +- Array reshaping and indexing +- Time handling + +## Usage + +Import utilities as needed: + +```python +from src.utils import tools, utils + +# Use file utilities +tools.ensure_directory_exists(output_path) + +# Use ensemble utilities +mean, spread = utils.ensemble_statistics(ensemble) +rmse = utils.compute_rmse(forecast, truth) +``` + +## Design Philosophy + +These utilities follow these principles: +- **Reusable**: Functions are general-purpose +- **Well-tested**: Covered by unit tests +- **Documented**: Clear docstrings +- **Efficient**: Optimized for common operations + +## Common Operations + +### Ensemble Statistics +```python +ensemble_mean = utils.compute_ensemble_mean(ensemble) +ensemble_spread = utils.compute_ensemble_spread(ensemble) +``` + +### File Operations +```python +tools.safe_create_directory(path) +tools.archive_old_results(directory) +``` + +### Data Validation +```python +utils.validate_ensemble_shape(ensemble, expected_shape) +utils.check_configuration(config_dict) +``` + +## Testing + +Unit tests for these utilities are located in `src/tests/`. + +## Adding New Utilities + +When adding new utility functions: +1. Choose the appropriate module (tools.py or utils.py) +2. Add comprehensive docstrings +3. Include type hints +4. Add unit tests +5. Update this README + +## Dependencies + +These modules have minimal dependencies: +- numpy (array operations) +- Python standard library + +This keeps them lightweight and universally usable across the framework. + +## Best Practices + +- Check for existing utilities before implementing new ones +- Keep functions focused and single-purpose +- Use descriptive function names +- Handle edge cases gracefully +- Return meaningful error messages