Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions utils/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# OPL YAML utils

This folder contains utility scripts for working with the YAML format to describe problems in context of OPL. They are mainly intended to be run automatically via GitHub Actions to make collaboration easier.

The intended way of adding a new problem to the repository is thus as follows:

* Change the [new_problem.yaml](new_problem.yaml) template file to fit the new problem.
* Create a PR which modifies with the changes (for example with a fork).

What happens in the background then is:

* On PR creation and commits to the PR, the [validate_yaml.py](validate_yaml.py) script is run to check that the YAML file is valid and consistent. It is expecting the changes to be in the [new_problem.yaml](new_problem.yaml) file.
* Then the PR should be reviewed manually.
* When the PR is merged into the main branch, a second script runs (which doesn't exist yet), that adds the content of [new_problem.yaml](new_problem.yaml) to the [problems.yaml](../problems.yaml) file, and returns it to its previous version.

:alert: Note that the GitHubActions do not exist yet either, this is a WIP.

## validate_yaml.py

This script checks the new content for the following:

* The YAML syntax is valid and is in expected format
* The required fields are present.
* Specific fields are unique across the new set of problems (e.g. name)

:alert: Execute from root of the repository. Tested with python 3.12

```bash
pip install -r utils/requirements.txt
python utils/validate_yaml.py utils/new_problem.yaml
```
14 changes: 14 additions & 0 deletions utils/new_problem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
- name: template
suite/generator/single: suite
objectives: '1'
dimensionality: scalable
variable type: continuous
constraints: 'no'
dynamic: 'no'
noise: 'no'
multimodal: 'yes'
multi-fidelity: 'no'
reference: ''
implementation: ''
source (real-world/artificial): ''
textual description: 'This is a dummy template'
1 change: 1 addition & 0 deletions utils/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyyaml
107 changes: 107 additions & 0 deletions utils/validate_yaml.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import yaml
import sys

# Define the required fields your YAML must have
REQUIRED_FIELDS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same list of fields in yaml_to_html.py (called default_columns). We should probably maintain it in a single place, and/or let one inherit the other?

"name",
"suite/generator/single",
"objectives",
"dimensionality",
"variable type",
"constraints",
"dynamic",
"noise",
"multimodal",
"multi-fidelity",
"reference",
"implementation",
"source (real-world/artificial)",
"textual description",
]

UNIQUE_FIELDS = ["name", "reference", "implementation"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggers an error when:

  • reference is '' (empty), which is probably not what we want. Thinking about it: A reference can also introduce multiple problems, so does not need to be unique in any case? What do you think?
  • Similarly, I expect the same happens for the implementation field, which may also not need to be unique, because a single package may implement multiple problems/benchmarks. (I guess it might depend a bit on how specific we want the reference to the implementation to be, but it probably cannot always be specific enough to be unique.)

PROBLEMS_FILE = "problems.yaml"


def read_data(filepath):
try:
with open(filepath, "r") as f:
data = yaml.safe_load(f)
return 0, data
except FileNotFoundError:
print(f"File not found: {filepath}")
return 1, None
except yaml.YAMLError as e:
print(f"YAML syntax error: {e}")
return 1, None


def check_format(data):
if len(data) != 1:
print("YAML file should contain exactly one top-level document.")
return False
if not isinstance(data[0], dict):
print("Top-level document should be a dictionary.")
return False
return True


def check_fields(data):
if len(data) != len(REQUIRED_FIELDS):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should test that there are at least this many fields. I would explicitly want people to add new fields for interesting properties we do not collect yet. Then:

  • Properties not in the REQUIRED_FIELDS should then be checked against a (to be created) NOT_REQUIRED_FIELDS which would contain all other fields (might be empty for now).
  • A message should be returned listing the new fields (found in neither the required or not-required lists), to be verified by an OPL maintainer as actually new (not just a new name for an existing property), and then either added to the not required list or fixed (or requested as change on a PR) to have the correct existing name.
  • Ideally all other checks are still done before such a list is returned, so we know everything else already passes the checks, and verifying new fields (or maybe other similar things) is all that needs to be done.

print(f"YAML file should contain exactly {len(REQUIRED_FIELDS)} fields.")
return False
missing = [field for field in REQUIRED_FIELDS if field not in data]
if missing:
print(f"Missing required fields: {', '.join(missing)}")
return False
# Check that the name is not still template
if data.get("name") == "template":
print("Please change the 'name' field from 'template' to a unique name.")
return False
return True


def check_novelty(data):
# Load existing problems
read_status, existing_data = read_data(PROBLEMS_FILE)
if read_status != 0:
print("Could not read existing problems for novelty check.")
return False
assert existing_data is not None
for field in UNIQUE_FIELDS:
existing_values = {
entry.get(field) for entry in existing_data if isinstance(entry, dict)
}
if data.get(field) in existing_values:
print(
f"Field '{field}' with value '{data.get(field)}' already exists. Please choose a unique value."
)
return False
return True


def validate_yaml(filepath):
status, data = read_data(filepath)
if status != 0:
sys.exit(1)
if not check_format(data):
sys.exit(1)
assert data is not None and len(data) == 1
new_data = data[0] # Extract the single top-level entry

# Check required and unique fields
if not check_fields(new_data) or not check_novelty(new_data):
sys.exit(1)

# YAML is valid if we reach this point
print("YAML syntax is valid.")
sys.exit(0)


if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python validate_yaml.py <yourfile.yaml>")
sys.exit(1)

filepath = sys.argv[1]
validate_yaml(filepath)