Skip to content

Normalize version fields to strings in import scripts#39

Merged
mihai-sysbio merged 6 commits intoresearch-software-ecosystem:mainfrom
arash77:normalize-version-fields
Mar 9, 2026
Merged

Normalize version fields to strings in import scripts#39
mihai-sysbio merged 6 commits intoresearch-software-ecosystem:mainfrom
arash77:normalize-version-fields

Conversation

@arash77
Copy link
Copy Markdown
Collaborator

@arash77 arash77 commented Jan 16, 2026

Introduce a normalization function to convert version fields to strings across various import scripts, ensuring consistent data formatting. This change enhances data integrity when processing tool and package metadata.
Closes research-software-ecosystem/content#1190

@mihai-sysbio
Copy link
Copy Markdown
Contributor

Thanks @arash77 this is a neat contribution. It's mixed together with formatting though, which albeit a great idea, it muddles what is the fix vs purely formatting. Is there a way you could split the two aspects? And if an adoption of PEP8 is desired in this repo, how about a GH Action that applies it automatically?

@arash77
Copy link
Copy Markdown
Collaborator Author

arash77 commented Jan 19, 2026

I will exclude the formatting from this PR. I can create a separate PR to talk about how an automated formatting could be applied.

@arash77 arash77 force-pushed the normalize-version-fields branch 2 times, most recently from 74ba363 to c1bb215 Compare January 19, 2026 16:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new common utility module for normalizing version fields from numeric types to strings across various metadata import scripts, addressing data integrity issues when processing tool and package metadata.

Changes:

  • Added common/metadata.py module with normalize_version_to_string and normalize_version_fields functions
  • Updated four import scripts (galaxytool-import, biotools-import, bioconductor-import, bioconda-import) to use the new normalization functions

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
common/metadata.py New utility module providing functions to normalize version fields (integers/floats) to strings with support for nested paths and list structures
galaxytool-import/galaxytool-import.py Integrated version normalization for Suite_version, Latest_suite_conda_package_version, and Related_Workflows latest_version fields
biotools-import/import.py Added version field normalization for both top-level version field and nested version fields within version arrays
bioconductor-import/import.py Applied normalization to the Version field in package metadata
bioconda-import/bioconda_importer.py Normalized package.version field in conda package data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hmenager hmenager requested a review from mihai-sysbio January 22, 2026 15:07
Copy link
Copy Markdown
Contributor

@hmenager hmenager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @arash77 !

@arash77 arash77 force-pushed the normalize-version-fields branch from 14e7448 to 7fd2d0a Compare March 3, 2026 10:52
arash77 added 6 commits March 4, 2026 10:12
Add normalize_version_fields function to convert version fields
(which can be int, float, or str) to string type for consistency.

Integrate version normalization into all import scripts:
- bioconda: normalize package.version
- bioconductor: normalize Version
- biotools: normalize version and nested version fields
- galaxytool: normalize Suite_version, conda package version, and workflow versions
@arash77 arash77 force-pushed the normalize-version-fields branch from affe90f to 9f0c526 Compare March 4, 2026 09:12
@mihai-sysbio
Copy link
Copy Markdown
Contributor

To test this PR, I have forked the content repo and modified the workflow import data to refer to the @arash77 branch normalise-version-fields, see https://github.com/mihai-sysbio/rsec-content/commit/590f10d8e069b1a4e29a3164eb817e455a2c44ee
With that in place, I was able to run the workflow manually https://github.com/mihai-sysbio/rsec-content/actions/runs/22862087561 . There was a merge error reported in the end, but I believe that is only due to a mismatch of hardcoded branches somewhere.
I did confirm that the update, e.g., normalised the Bioconda version of MultiQC into a string, so all good.

My action plan is to merge this now, trigger the import data workflow manually in the content repo and see that the merge goes through. Finally, I will see that the changes get nicely propagated to the Atlas.

@mihai-sysbio mihai-sysbio merged commit 5c06692 into research-software-ecosystem:main Mar 9, 2026
1 check passed
@mihai-sysbio
Copy link
Copy Markdown
Contributor

Confirming everything seemed to work in production, great job @arash77 🙌🏻

@arash77 arash77 deleted the normalize-version-fields branch March 10, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consistent field types in metadata formats

4 participants