-
-
Couldn't load subscription status.
- Fork 1.4k
Description
We have the regression tests, which are amazing for finding if a change breaks SVGs. However, is not good at measuring if a change was actually meaningful.
For example:
- After a bug fix, do the metrics reflect what was expected?
- If we have a proposed or revised optimization, is it actually reducing bytes?
- How long did it take to run the regression test pipeline?
- Is a change doing anything else to the output of the SVGs? (Change in formatting, attribute order, etc.)
Before fixing more bugs/optimizations, let's finally address this so we have more insight.
Scope
Allow Test Failures ✅
The regression tests should test all SVGs that it can, including ones that may fail. We should instead define three lists:
| List | Description |
|---|---|
expect-error |
SVGs that we know are broken by SVGO. The build will fail if during the regression pipeline, we determine that one of them are actually not broken. This is for when we've intentionally or incidentally fixed an SVG. |
ignore |
Ignore the results of these SVGs as they are finicky. They sometimes pass, sometimes fail, we'll figure out why eventually. We'll report the status of them regardless, but the result has no effect on the pipeline status. |
skip |
SVGs that we shouldn't extract. We only have one SVG for this scenario right now, which takes too long to optimize to be practical for CI environments. |
Instead of being defined in code, let's make these separate config files that are read from in the regression tests directory. They'll be standalone text files, one file per line with support for comments.
Metrics ✅
The regression tests must output the following metrics:
- How many tests pass or failed, so we know the progress towards acing regression tests.
- The total number of bytes saved so we can confirm changes we are actually optimizing SVGs, and not undoing optimizations beyond necessary in fixes.
- The hash of every individual SVG, so we can keep track of which files are affected by changes in SVGO.
- The maximum memory footprint the process had during the run.
The results will be written to STDOUT and a file.
STDOUT
The resulting output should look something like this, but happy to consider alternative presentations. The data contained in the example is fictional.
Normal Run:
SVGO Test Suite Version: 30ba6479ef7570d3748a30bbbd39f8c7
▶ Test Results
Match: 4,799 / 4,799
Expected Mismatch: 159 / 159
Ignored: 1 / 1
Skipped: 1
▶ Metrics
Bytes Saved: 40.4 MiB
Time Taken: 23m56s
Peak Memory Alloc: 2.305 GB
▶ Relative to svg/svgo.git#main
Files Affected: 277 / 4,959
Bytes Saved Δ: +0.4 MiB
Time Taken Δ: +20s
Peak Memory Alloc Δ: +0.001 GB
If there is no previous report to compare results to:
…
▶ Relative to svg/svgo.git#main
Previous test report not available.
If the SVGO Test Suite version is different from the version specified in the last test report:
…
▶ Relative to svg/svgo.git#main
Previous test report used a different version of SVGO Test Suite.
Rerun regression tests on main to regenerate test report, then try again.
If there are failing test cases, show them in a bullet point list, probably in red as one would expect in other test runners:
▶ Expected match, but actually mismatched:
✖ oxygen-icons-5.116.0/scalable/actions/hidef/tools-rip-audio-cd.svg
✖ oxygen-icons-5.116.0/scalable/actions/im-ban-kick-user.svg
Implementation Details:
- The SVGO Test Suite version will be included in the download from the svg/svgo-test-suite bundle in a file named
VERSION. - In
Relative to …, any… Deltaline must use the same units as the field it's relative to. - Numbers must be localized according to the system locale. (i.e.,
1,000not1000for en-GB) - Files Changed can be determined by including a checksum of each optimized SVG in the test report, and comparing the checksums.
- It's expected to use a library to make text bold or colored, etc. The capabilities of the library may influence the design. For example, if we make headings bold, there's no need to underline them.
File
We will write a more verbose report to …/svgo-test-report.json, containing raw data for the run. The JSON file should not be concerned with presentation.
This will include everything, including the checksum of every file that was optimized. This file will not be committed to the Git repository.
The tests will run on a merge to main, and will upload an artifact to GitHub (or cloud storage if GitHub isn't practical for whatever reason). Then, on CI and perhaps locally, we'll generate the same report. The regression tests can compare the results to the last report for main, and print the relative summary.
Show Progress ✅
While we're doing this revamp anyway, perhaps we should show a loading state of some kind until the results are ready. Better than looking frozen!
It should be a progress bar of some kind that shows how many of the total SVGs have been processed so far.
Related
- johnkenny54 did something like this before in Add statistics to regression.js #1932
Feedback
Keep Metrics more Focused ✅
Lorfdail suggested that we should run the bulk optimizations and the Playwright in separate processes so that we could get a more accurate read on the duration and memory usage without the overhead of the testing method.
This is a great idea, so that will be done as part of this epic as well.
This will hopefully give us slightly more consistent metrics, since it keeps what we're timing/measuring more focused.
Add Steps to STDOUT
Lorfdail suggested that we should add "Step X" logs to the STDOUT while running. We'll consider this as it's true the current UX isn't the best.
We go from 0 to 4,800~ during the optimization step, just to go back to 0 again for the comparison step. This gives the contributor a false sense of the work remaining.