Conversation
There was a problem hiding this comment.
Pull request overview
This pull request simplifies the EnKF analysis workflow by removing the computation and return of analysis error covariance from all ensemble analysis methods. It also adds infrastructure for strong and weak scaling performance tests via new SLURM batch scripts.
Changes:
- Modified all four EnKF analysis methods (
EnKF_Analysis,DEnKF_Analysis,EnRSKF_Analysis,EnTKF_Analysis) to return only the updated ensemble instead of both ensemble and error covariance - Updated the serial data assimilation workflow to match the new single-return signatures
- Added SLURM batch scripts for conducting strong and weak scaling tests with Spack environment configuration
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| src/EnKF/python_enkf/EnKF.py | Commented out covariance computation and simplified return signatures for all four analysis methods |
| src/run_model_da/icesee_da_serial.py | Updated method calls to match new single-return signatures (no longer unpacking covariance) |
| applications/issm_model/examples/ISMIP_Choi/run_job_weak_spack.sbatch | New script for weak scaling tests with varying ensemble sizes and fixed model processes |
| applications/issm_model/examples/ISMIP_Choi/run_job_strong_spack.sbatch | New script for strong scaling tests with fixed ensemble size and varying process counts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #!/bin/bash | ||
| #SBATCH -t250:00:00 | ||
| #SBATCH -JISMIP | ||
| #SBATCH -N 14 # 1080 / 24 = 45 nodes |
There was a problem hiding this comment.
The comment on line 4 is inconsistent with the actual SLURM directive. The comment says "1080 / 24 = 45 nodes" but the directive specifies 14 nodes. Either the comment should be updated to "336 / 24 = 14 nodes" (based on the -n 336 directive on line 5), or if 45 nodes is actually needed, the -N directive should be corrected.
| #SBATCH -N 14 # 1080 / 24 = 45 nodes | |
| #SBATCH -N 14 # 336 / 24 = 14 nodes |
|
|
||
| # Compute total ranks used = np * (MODEL_NPROCS + 1) | ||
| local rank_size=$(( np * (MODEL_NPROCS + 1) )) | ||
| local data_path="_modelrun_datasets" |
There was a problem hiding this comment.
The variable data_path is defined on line 43 but never used. Either remove this unused variable or add the necessary logic that references this path.
| local data_path="_modelrun_datasets" |
| # ensemble[indx_map[key],ens] = value + q0[:len(value)] | ||
| # ensemble[indx_map[key],ens] = value + np.random.multivariate_normal(np.zeros(len(value)), 0.0*np.eye(len(value))) |
There was a problem hiding this comment.
The indentation for these commented-out lines appears inconsistent. Line 129 has correct indentation (matching the comment above it), but line 130 has extra indentation that doesn't align with the surrounding code structure. This should be corrected for consistency, even though the code is commented out.
| # return ensemble_analysis, analysis_error_cov | ||
| return ensemble_analysis |
There was a problem hiding this comment.
The docstring for this method (lines 380-389, outside the changed region) still indicates it returns both ensemble_analysis and analysis_error_cov, but the implementation now only returns ensemble_analysis. While the docstring itself cannot be edited in this PR since it's outside the changed regions, this creates an inconsistency between documentation and implementation that should be addressed.
| # Compute total ranks used = np * (MODEL_NPROCS + 1) | ||
| local rank_size=$(( np * (MODEL_NPROCS + 1) )) |
There was a problem hiding this comment.
The variable rank_size is computed on line 42 but never used. This calculation appears intended for resource allocation or validation but is not utilized. Either remove this unused variable or add the necessary logic that uses it for validation or configuration.
| # Compute total ranks used = np * (MODEL_NPROCS + 1) | |
| local rank_size=$(( np * (MODEL_NPROCS + 1) )) |
| #SBATCH --mail-user=bkyanjo3@gatech.edu | ||
|
|
||
| # set -euo pipefail | ||
| cd $SLURM_SUBMIT_DIR |
There was a problem hiding this comment.
Inconsistent quoting: cd $SLURM_SUBMIT_DIR should use quotes like cd "$SLURM_SUBMIT_DIR" (as done in the weak scaling script on line 15) to handle paths with spaces safely. This is a shell scripting best practice.
| cd $SLURM_SUBMIT_DIR | |
| cd "$SLURM_SUBMIT_DIR" |
| # "$MPI_DIR/bin/mpirun" -np "$np" \ | ||
| # srun --mpi=pmix -n "$np" \ | ||
|
|
There was a problem hiding this comment.
These commented-out lines (36-37) appear to be leftover debug or alternative implementation code. Since this is production code for scaling tests, consider removing these comments to improve readability, or if they serve as documentation for alternative MPI launch methods, add a clearer explanation of why they're preserved.
| # "$MPI_DIR/bin/mpirun" -np "$np" \ | |
| # srun --mpi=pmix -n "$np" \ |
| # return ensemble_analysis, analysis_error_cov | ||
| return ensemble_analysis |
There was a problem hiding this comment.
The docstring for this method (lines 417-426, outside the changed region) still indicates it returns both ensemble_analysis and analysis_error_cov, but the implementation now only returns ensemble_analysis. This creates an inconsistency between documentation and implementation that should be addressed.
| # analysis_error_cov = ensemble_analysis + self.Cov_model@(U@np.diag(np.sqrt(S))@U.T) | ||
|
|
||
| return ensemble_analysis, analysis_error_cov | ||
| return ensemble_analysis |
There was a problem hiding this comment.
The docstring for this method (lines 457-466, outside the changed region) still indicates it returns both ensemble_analysis and analysis_error_cov, but the implementation now only returns ensemble_analysis. This creates an inconsistency between documentation and implementation that should be addressed.
| # return ensemble_analysis, analysis_error_cov | ||
| return ensemble_analysis |
There was a problem hiding this comment.
The docstring for this method (lines 490-499, outside the changed region) still indicates it returns both ensemble_analysis and analysis_error_cov, but the implementation now only returns ensemble_analysis. This creates an inconsistency between documentation and implementation that should be addressed.
This pull request introduces two new SLURM batch scripts for strong and weak scaling tests, and updates the ensemble analysis methods in the EnKF Python code to simplify their outputs. The most significant change is that all analysis methods now return only the updated ensemble, removing the computation and return of the analysis error covariance. This change is reflected in both the analysis method implementations and their usage in the serial data assimilation workflow.
Scaling batch scripts:
run_job_strong_spack.sbatchandrun_job_weak_spack.sbatchscripts for launching strong and weak scaling tests, respectively, with Spack environment setup and SLURM resource configuration. [1] [2]Ensemble analysis output simplification:
EnKF_Analysis,DEnKF_Analysis,EnRSKF_Analysis, andEnTKF_Analysismethods inEnKF.pyto return only the updated ensemble (ensemble_analysis), removing the computation and return of the analysis error covariance. [1] [2] [3] [4]icesee_model_data_assimilation_serialinicesee_da_serial.pyto match new method signatures, only assigning the updated ensemble and not expecting covariance output.Debug and experimental code changes:
EnKF.py. [1] [2] [3] [4] [5]These changes streamline the ensemble analysis workflow and add infrastructure for performance testing.