Fuzz output validators by Matistjati · Pull Request #348 · Kattis/problemtools

Matistjati · 2025-08-19T18:54:04Z

I suspect that many output validators out there are not crash-proof against all contestant output. I think that adding a couple of evil testcases might catch some poorly written ones.

I decided to only run it on samples, since performance actually becomes a concern if you have lots of testcases (especially if they don't deduplicate symlinks. Not sure if we do this, will have to check in the future).

For the contents of the cases, I just tried to think of ways I to incorrectly write some existing validators. Perhaps @niemela or @gkreitz could look into judge errors on Kattis and add a minimal version of what killed the output validator?

gkreitz

Looks like a good feature to add. One problem with the structure on how you run it (I don't get why it's tied to sample cases, but perhaps I'm missing something).

Here are a couple of more cases I think we want to throw in (either in this PR, or a follow-up):

NaN
inf
a null byte
invalid UTF-8

I think a hard coded list of test cases like this is likely a bit too simplistic to yield good results though, but it's a simple enough addition to add, and if it helps catch one or a few bugs, it's worth it.

gkreitz · 2025-08-20T08:12:12Z

problemtools/verifyproblem.py

+            # Note that these might be valid output, so we only check if it crashes
+            sample_cases = [tc for tc in self.problem.testdata.get_all_testcases() if tc.is_in_sample_group()]
+            for desc, junk_case_content in _JUNK_CASES_CRASH:
+                run_junk_case(desc, junk_case_content, sample_cases)


If I parse the code correctly, it looks to me like you'll run the fuzzing cases the same number of times as there are sample cases? This feels wasteful if there are multiple sample cases, and bad if there are none (then these tests won't be run).

Don't we always just want to run them once (e.g., just grab the first test case, instead of all samples).

You're correct in your reading.
I was kinda hoping that by checking multiple samples, we increase the probability that the case will be semi-valid output for said sample and get further in the validator. I'm still inclined to arbitrarily take the first 3, for an overhead of 120ms when using C++. When using Python, this increases to 2.5 seconds, so I changed it to take only one testcase if you're using Python, resulting in ~800ms.

Matistjati · 2025-08-20T11:39:04Z

I think a hard coded list of test cases like this is likely a bit too simplistic to yield good results though, but it's a simple enough addition to add, and if it helps catch one or a few bugs, it's worth it.

You'd be surprised at the amount of new authors who write output validators that will blow up if you sneeze on them. I've seen this at least 5 times.

Matistjati and others added 4 commits August 17, 2025 19:27

Slow fuzzing of output vals

ff934d5

Only fuzz on samples

bc4a1bf

Merge branch 'Kattis:master' into fuzz-output-validators

4bc085c

Fix type hint

f951d07

Matistjati mentioned this pull request Aug 19, 2025

Deterministic input fuzzing #349

Closed

gkreitz requested changes Aug 20, 2025

View reviewed changes

More crash cases and limit if using python

a48a387

Fix type error

3f2c5f2

gkreitz approved these changes Aug 20, 2025

View reviewed changes

gkreitz merged commit 81844c4 into Kattis:master Aug 20, 2025
5 checks passed

Matistjati deleted the fuzz-output-validators branch August 20, 2025 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz output validators#348

Fuzz output validators#348
gkreitz merged 6 commits intoKattis:masterfrom
Matistjati:fuzz-output-validators

Matistjati commented Aug 19, 2025 •

edited

Loading

Uh oh!

gkreitz left a comment

Uh oh!

gkreitz Aug 20, 2025

Uh oh!

Matistjati Aug 20, 2025

Uh oh!

Matistjati commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Matistjati commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkreitz left a comment

Choose a reason for hiding this comment

Uh oh!

gkreitz Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Matistjati Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Matistjati commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Matistjati commented Aug 19, 2025 •

edited

Loading

Matistjati commented Aug 20, 2025 •

edited

Loading