-
Notifications
You must be signed in to change notification settings - Fork 3
Consistency check script #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| # OPL YAML utils | ||
|
|
||
| This folder contains utility scripts for working with the YAML format to describe problems in context of OPL. They are mainly intended to be run automatically via GitHub Actions to make collaboration easier. | ||
|
|
||
| The intended way of adding a new problem to the repository is thus as follows: | ||
|
|
||
| * Change the [new_problem.yaml](new_problem.yaml) template file to fit the new problem. | ||
| * Create a PR which modifies with the changes (for example with a fork). | ||
|
|
||
| What happens in the background then is: | ||
|
|
||
| * On PR creation and commits to the PR, the [validate_yaml.py](validate_yaml.py) script is run to check that the YAML file is valid and consistent. It is expecting the changes to be in the [new_problem.yaml](new_problem.yaml) file. | ||
| * Then the PR should be reviewed manually. | ||
| * When the PR is merged into the main branch, a second script runs (which doesn't exist yet), that adds the content of [new_problem.yaml](new_problem.yaml) to the [problems.yaml](../problems.yaml) file, and returns it to its previous version. | ||
|
|
||
| :alert: Note that the GitHubActions do not exist yet either, this is a WIP. | ||
|
|
||
| ## validate_yaml.py | ||
|
|
||
| This script checks the new content for the following: | ||
|
|
||
| * The YAML syntax is valid and is in expected format | ||
| * The required fields are present. | ||
| * Specific fields are unique across the new set of problems (e.g. name) | ||
|
|
||
| :alert: Execute from root of the repository. Tested with python 3.12 | ||
|
|
||
| ```bash | ||
| pip install -r utils/requirements.txt | ||
| python utils/validate_yaml.py utils/new_problem.yaml | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| - name: template | ||
| suite/generator/single: suite | ||
| objectives: '1' | ||
| dimensionality: scalable | ||
| variable type: continuous | ||
| constraints: 'no' | ||
| dynamic: 'no' | ||
| noise: 'no' | ||
| multimodal: 'yes' | ||
| multi-fidelity: 'no' | ||
| reference: '' | ||
| implementation: '' | ||
| source (real-world/artificial): '' | ||
| textual description: 'This is a dummy template' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| pyyaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| import yaml | ||
| import sys | ||
|
|
||
| # Define the required fields your YAML must have | ||
| REQUIRED_FIELDS = [ | ||
| "name", | ||
| "suite/generator/single", | ||
| "objectives", | ||
| "dimensionality", | ||
| "variable type", | ||
| "constraints", | ||
| "dynamic", | ||
| "noise", | ||
| "multimodal", | ||
| "multi-fidelity", | ||
| "reference", | ||
| "implementation", | ||
| "source (real-world/artificial)", | ||
| "textual description", | ||
| ] | ||
|
|
||
| UNIQUE_FIELDS = ["name", "reference", "implementation"] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This triggers an error when:
|
||
| PROBLEMS_FILE = "problems.yaml" | ||
|
|
||
|
|
||
| def read_data(filepath): | ||
| try: | ||
| with open(filepath, "r") as f: | ||
| data = yaml.safe_load(f) | ||
| return 0, data | ||
| except FileNotFoundError: | ||
| print(f"File not found: {filepath}") | ||
| return 1, None | ||
| except yaml.YAMLError as e: | ||
| print(f"YAML syntax error: {e}") | ||
| return 1, None | ||
|
|
||
|
|
||
| def check_format(data): | ||
| if len(data) != 1: | ||
| print("YAML file should contain exactly one top-level document.") | ||
| return False | ||
| if not isinstance(data[0], dict): | ||
| print("Top-level document should be a dictionary.") | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| def check_fields(data): | ||
| if len(data) != len(REQUIRED_FIELDS): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should test that there are at least this many fields. I would explicitly want people to add new fields for interesting properties we do not collect yet. Then:
|
||
| print(f"YAML file should contain exactly {len(REQUIRED_FIELDS)} fields.") | ||
| return False | ||
| missing = [field for field in REQUIRED_FIELDS if field not in data] | ||
| if missing: | ||
| print(f"Missing required fields: {', '.join(missing)}") | ||
| return False | ||
| # Check that the name is not still template | ||
| if data.get("name") == "template": | ||
| print("Please change the 'name' field from 'template' to a unique name.") | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| def check_novelty(data): | ||
| # Load existing problems | ||
| read_status, existing_data = read_data(PROBLEMS_FILE) | ||
| if read_status != 0: | ||
| print("Could not read existing problems for novelty check.") | ||
| return False | ||
| assert existing_data is not None | ||
| for field in UNIQUE_FIELDS: | ||
| existing_values = { | ||
| entry.get(field) for entry in existing_data if isinstance(entry, dict) | ||
| } | ||
| if data.get(field) in existing_values: | ||
| print( | ||
| f"Field '{field}' with value '{data.get(field)}' already exists. Please choose a unique value." | ||
| ) | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| def validate_yaml(filepath): | ||
| status, data = read_data(filepath) | ||
| if status != 0: | ||
| sys.exit(1) | ||
| if not check_format(data): | ||
| sys.exit(1) | ||
| assert data is not None and len(data) == 1 | ||
| new_data = data[0] # Extract the single top-level entry | ||
|
|
||
| # Check required and unique fields | ||
| if not check_fields(new_data) or not check_novelty(new_data): | ||
| sys.exit(1) | ||
|
|
||
| # YAML is valid if we reach this point | ||
| print("YAML syntax is valid.") | ||
| sys.exit(0) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| if len(sys.argv) < 2: | ||
| print("Usage: python validate_yaml.py <yourfile.yaml>") | ||
| sys.exit(1) | ||
|
|
||
| filepath = sys.argv[1] | ||
| validate_yaml(filepath) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same list of fields in
yaml_to_html.py(calleddefault_columns). We should probably maintain it in a single place, and/or let one inherit the other?