Skip to content

Conversation

@rhassaine
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/oncoanalyser branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Add automatic download of ~25GB HMF reference data from Hartwig's
R2 CDN before running nf-tests. This ensures tests have access to
the required GRCh38_hmf genome and WGS resource files.

Changes:
- Add Nextflow setup and reference download steps to nf-test workflow
- Use prepare_reference mode to download from R2 CDN
- Configure tests/nextflow.config to detect and use local reference data
- Falls back to remote URLs for local development

The download adds ~10-15 minutes to CI runs but ensures tests can
access all required reference files without caching (GitHub Actions
cache is limited to 10GB).
Refactor nf-test workflow to download ~25GB HMF reference data once
and share across all matrix jobs using runs-on Magic Cache (S3-backed).

Changes:
- Enable Magic Cache (extras=s3-cache) on all jobs
- Add dedicated download-reference job that runs once before matrix
- Use actions/cache with runs-on Magic Cache for S3-backed storage
- Matrix jobs now restore from cache instead of downloading individually
- Add runs-on/action@v2 to all jobs for Magic Cache support

Performance impact:
- Before: 42 jobs × 15 min = 630 minutes of download time
- After: 15 min download + (42 × 3 min restore) = 141 minutes
- Saves ~489 minutes (~8 hours) per workflow run

Benefits:
- No GitHub 10GB cache limit (uses S3 backend)
- Fast cache restore across all matrix jobs
- Cache persists across workflow runs
- Significant CI time savings
@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@rhassaine rhassaine force-pushed the feat/hmf-reference-download branch 2 times, most recently from a0d15ae to e6fb586 Compare November 12, 2025 14:52
@rhassaine rhassaine force-pushed the feat/hmf-reference-download branch from 9002c20 to 70c3b7c Compare November 14, 2025 12:58
@scwatts
Copy link
Member

scwatts commented Nov 20, 2025

Hi @rhassine, I've reworked the nf-test GitHub Action workflow over in #265, which also has the test profile fully working. I left the reference data caching untouched so you can continue working on that here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants