Skip to content

Conversation

@cailmdaley
Copy link
Collaborator

Summary

This PR addresses catalog configuration maintenance issues by standardizing all catalog paths to absolute references and adding automated validation. The changes eliminate catalog path drift between related variants and make the configuration explicit about filesystem locations.

Problem

Dual maintenance burden: Each catalog had both a base entry and a _leak_corr variant, causing configuration drift when one was updated but not the other. This drift was invisible until regression tests failed.

Implicit path resolution: Catalogs used relative subdir paths combined with a data_dir fallback, making it difficult to determine actual filesystem locations as data spread across multiple mounts.

Changes

Catalog configuration normalization (notebooks/cosmo_val/cat_config.yaml):

  • Converted all subdir entries to explicit absolute paths (/n17data/...)
  • Removed reliance on data_dir fallback logic
  • Pointed shared resources (star/PSF files) to canonical mount locations

Pseudo-C_ℓ code path updates (src/sp_validation/cosmo_val.py):

  • Updated to read redshift files directly from catalog configuration
  • Removed data_base_dir assumption

New validation test (src/sp_validation/tests/test_catalog_paths.py):

  • Validates that all catalog paths (shear/star/PSF) resolve to existing files
  • Maintains allow-list for catalogs with currently unavailable source data
  • Prevents future configuration drift

Regression test updates:

  • Limited additive-bias tests to catalogs with data currently on disk
  • Excluded missing releases to keep CI green

Known Limitations

The following catalogs are missing source files and remain in configuration for provenance only:

  1. Early releases: SP_v1.0, SP_v1.1, SP_matched_MP_v1.0
  2. v1.4 family: SP_v1.4, SP_v1.4_conv, SP_v1.4_noalpha
  3. v1.4 patch derivatives: All SP_v1.4-P1+3* variants

These are listed in the validator allow-list and excluded from regression tests.

Follow-up Work

  1. Resolve legacy catalog status: Either restore missing data or formally deprecate affected versions
  2. Re-point v1.4 variants: Update paths to correct /n17data/UNIONS/WL/v1.4.x/<variant>/ directories when data becomes available

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@cailmdaley
Copy link
Collaborator Author

hi all, adding some thoughts to the Claude-generated summary above:

  • i think this will make things much more flexible going forward, but will be a big pain to merge other changes to cat_config that have been made in parallel branches, especially because i reordered the versions a little. i volunteer to do any such merging (looking at @sachaguer since i saw you've made lots of changes to your cat_config)
  • my big question for you all: what to do with these older versions with missing files on candide? should we just add a # Deprecated comment above each one? or full-on delete them?
  • maybe i haven't considered all the ramifications of this change, please do raise any problems i may have missed!

@sachaguer
Copy link
Contributor

I think old versions that are not used can safely be deleted. The logic is that if someone wants to rerun something he just adds the catalog he/she wants.

The only modification I did to the cat_config I believe is to add catalogs and add a path to a mask file in the most recent update. I also have very ugly management of the paths that could be improved if you have something better.

@LisaGoh
Copy link
Member

LisaGoh commented Oct 14, 2025

Thank you for this! Indeed the cat_config file was getting rather long and cumbersome...I'm for deleting legacy stuff, and perhaps just keeping the versions of the catalogue that everyone is sharing. This info could also be reflected in the wiki (with consistent values) for completeness!

cailmdaley and others added 8 commits October 15, 2025 00:57
Eliminate redundant redshift_file parameter and load n(z) directly from
catalog configuration. Added get_redshift() method as single source of truth.

- Renamed shear.redshift_distr → shear.redshift_path in cat_config.yaml
- Added get_redshift(version) method for catalog-aware n(z) loading
- Updated calculate_pure_eb(), plot_pure_eb(), calculate_pseudo_cl_eb_cov()
  to use get_redshift()
- Removed redshift_file parameter from __init__, calculate_pure_eb(),
  plot_pure_eb() signatures and docstrings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update CosmologyValidation call to match current API. The data_base_dir
parameter was removed in the catalog config refactor; all paths are now
resolved from cat_config.yaml.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add support for seed-specific mock catalog variants (e.g., SP_v1.4.5_glass_mock_seed1)
by extracting and substituting seed tokens in shear paths. Enables exploring multiple
random realizations of the same mock survey.

- Add SP_v1.4.6_glass_mock catalog entry with v1.4.6 survey specs
- Refactor version processing in __init__ to use recursive ensure_version_exists()
- Support _seed<N> variants that deep-copy base config and substitute seed token
- Handle _seed<N>_leak_corr combinations by materializing seed config first
- Add explicit error checking for missing seed tokens in paths
- Add regression tests for seed variant creation and error cases

Seed variant examples:
  - SP_v1.4.5_glass_mock_seed1 → unions_glass_sim_00001_4096.fits
  - SP_v1.4.6_glass_mock_seed12 → unions_glass_sim_00012_4096.fits
  - SP_v1.4.5_glass_mock_seed1_leak_corr → combines both transforms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add SP_v1.4.6_glass_mock to parametrized additive bias tests
- Update SP_v1.4.6_glass_mock shear path to glass_mock_v1.4.6 directory
- Add test_v1_4_6_glass_mock_seed_variant to verify seed9 variant loads correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace brittle regex-based seed token detection with config-driven
path templates using Python string formatting. Each catalog specifies
a path_template with {seed:05d} or {seed} placeholders.

- Add path_template fields to SP_v1.4.5_glass_mock and SP_v1.4.6_glass_mock
- Simplify _split_seed_variant to pure string operations (no regex)
- Add _materialize_seed_path using .format() for clean templating
- Fallback to legacy seed-token extraction if no template provided
- All 15 tests pass including new v1.4.6 glass mock seed9 test

Benefits:
- Per-catalog control of path formatting without code changes
- No complex regex fragility
- Clear, self-documenting config entries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Verify that SP_v1.4.6_glass_mock without a seed suffix correctly
uses the default seed 00001 from the configured path field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@cailmdaley
Copy link
Collaborator Author

thanks for the comments! i removed old catalog versions. some more changes:

  • added Lisa's new masks to cat_config for 1.4.6-8, and updated survey properties in cat config based on the masks.
  • compute_survey_stats uses the mask to calculate n_e, sigma_e and optionally writes them directly to the cat_config.
  • added docstring to CosmologyValidation.__init__
  • n(z) now loaded from catalog config via get_redshift() method
  • glass mock seeds can now be specified in the version, e.g. SP_v1.4.6_glass_mock_seed9 to allow us to make fits inference files for Sacha's new glass mocks

should be ready to merge

@cailmdaley
Copy link
Collaborator Author

also updated the wiki with these values as Lisa suggested

@cailmdaley
Copy link
Collaborator Author

just to tabulate the versions being removed/retained:

● Removed (28 versions):
  - LF_matched_SP_v1.0
  - LF_v1.0
  - LF_v2.0
  - SP_matched_LF_fp_v1.0
  - SP_matched_LF_v1.0
  - SP_matched_MP_v1.0
  - SP_v1.0
  - SP_v1.0_LFmask_4k
  - SP_v1.0_LFmask_8k
  - SP_v1.1
  - SP_v1.3_LFmask_4k
  - SP_v1.3_LFmask_8k
  - SP_v1.3_LFmask_8k_F2
  - SP_v1.3_LFmask_8k_SN7
  - SP_v1.3_LFmask_8k_SN8
  - SP_v1.3_LFmask_8k_li_2024
  - SP_v1.3_LFmask_8k_no_alpha
  - SP_v1.4
  - SP_v1.4-P1+3
  - SP_v1.4-P1+3+4
  - SP_v1.4-P1+3+4_no_alpha
  - SP_v1.4-P1+3+4_wcs
  - SP_v1.4-P1+3_li_2024
  - SP_v1.4-P1+3_no_alpha
  - SP_v1.4-P1+3_wcs
  - SP_v1.4-P3_LFmask
  - SP_v1.4_conv
  - SP_v1.4_noalpha

  Working Versions (21 retained):
  - DES
  - SP_axel_v0.0
  - SP_test
  - SP_v0.1.1
  - SP_v1.3
  - SP_v1.4-P3
  - SP_v1.4.1_noleakage
  - SP_v1.4.2
  - SP_v1.4.5
  - SP_v1.4.5.A
  - SP_v1.4.5_bright
  - SP_v1.4.5_faint
  - SP_v1.4.5_glass_mock
  - SP_v1.4.5_intermediate
  - SP_v1.4.6
  - SP_v1.4.6_glass_mock
  - SP_v1.4.7
  - SP_v1.4.8
  - SP_v1.4_LFmask_8k
  - SP_v1.4_LFmask_8k_noalpha
  - SP_v1.5.4 

cailmdaley and others added 3 commits October 24, 2025 17:24
Remove 14 non-functional catalog versions from cat_config.yaml that have missing or misconfigured file paths:
- LF_matched_SP_v1.0, LF_v1.0, LF_v2.0
- SP_matched_LF_v1.0, SP_v1.0_LFmask_4k, SP_v1.0_LFmask_8k
- SP_v1.3_LFmask variants (4k, 8k, F2, SN7, SN8, li_2024, no_alpha)
- SP_v1.4-P3_LFmask

Retain 2 working LFmask versions: SP_v1.4_LFmask_8k and SP_v1.4_LFmask_8k_noalpha

Consolidate test_catalog_paths.py functionality into test_cosmo_val.py:
- test_catalog_paths_exist() now programmatically discovers all catalog versions
- Simplify test_additive_bias_base_columns() to test only SP_v1.4.5
- Update test_additive_bias_leak_corrected_columns() to test SP_v1.4.6_leak_corr
- Result: 10x faster test suite (614s → 25s)

All tests pass. Remaining: 21 working catalog versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

[pipeline]
values = cosmosis_config/values_psf.ini
priors = cosmosis_config/priors_psf.ini
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to modify the prior file with another template.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i copied priors_psf.ini to priors_mock.ini and committed it, if you want to modify that

@sachaguer sachaguer self-requested a review October 28, 2025 12:56
@cailmdaley
Copy link
Collaborator Author

  • wired pseudo-Cl inputs directly into Snakemake and cosmosis_fitting, adding the matching covariance HDU and keeping BB data only in the FITS file
  • refreshed READMEs/configs, added mock priors, and ensured harmonic configs list only CELL_EE

@cailmdaley
Copy link
Collaborator Author

also confirmed with the data fits file what keys to use for QUANT (not P+P...)

@sachaguer
Copy link
Contributor

It should be G+R for XI_PLUS and G-R for XI_MINUS

return cl_ee_hdu, cl_bb_hdu


def cov_cl_to_fits(cov_file, nbins):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will save only the gaussian part of the covariance of iNKA. It should be using the output of calculate_pseudo_cl_g_ng_cov which contains the HDUs named:

  • COVAR_GAUSSIAN
  • COVAR_NON_GAUSSIAN
  • COVAR_FULL
    The HDU COVAR_FULL should be used.

@sachaguer
Copy link
Contributor

It is seems ready to merge for me @cailmdaley

…rence run (Cosmosis). Added a prior file for harmonic space mocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants