Skip to content

Fuzz output validators#348

Merged
gkreitz merged 6 commits intoKattis:masterfrom
Matistjati:fuzz-output-validators
Aug 20, 2025
Merged

Fuzz output validators#348
gkreitz merged 6 commits intoKattis:masterfrom
Matistjati:fuzz-output-validators

Conversation

@Matistjati
Copy link
Contributor

@Matistjati Matistjati commented Aug 19, 2025

I suspect that many output validators out there are not crash-proof against all contestant output. I think that adding a couple of evil testcases might catch some poorly written ones.

I decided to only run it on samples, since performance actually becomes a concern if you have lots of testcases (especially if they don't deduplicate symlinks. Not sure if we do this, will have to check in the future).

For the contents of the cases, I just tried to think of ways I to incorrectly write some existing validators. Perhaps @niemela or @gkreitz could look into judge errors on Kattis and add a minimal version of what killed the output validator?

Copy link
Contributor

@gkreitz gkreitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good feature to add. One problem with the structure on how you run it (I don't get why it's tied to sample cases, but perhaps I'm missing something).

Here are a couple of more cases I think we want to throw in (either in this PR, or a follow-up):

  • NaN
  • inf
  • a null byte
  • invalid UTF-8

I think a hard coded list of test cases like this is likely a bit too simplistic to yield good results though, but it's a simple enough addition to add, and if it helps catch one or a few bugs, it's worth it.

# Note that these might be valid output, so we only check if it crashes
sample_cases = [tc for tc in self.problem.testdata.get_all_testcases() if tc.is_in_sample_group()]
for desc, junk_case_content in _JUNK_CASES_CRASH:
run_junk_case(desc, junk_case_content, sample_cases)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I parse the code correctly, it looks to me like you'll run the fuzzing cases the same number of times as there are sample cases? This feels wasteful if there are multiple sample cases, and bad if there are none (then these tests won't be run).

Don't we always just want to run them once (e.g., just grab the first test case, instead of all samples).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct in your reading.
I was kinda hoping that by checking multiple samples, we increase the probability that the case will be semi-valid output for said sample and get further in the validator. I'm still inclined to arbitrarily take the first 3, for an overhead of 120ms when using C++. When using Python, this increases to 2.5 seconds, so I changed it to take only one testcase if you're using Python, resulting in ~800ms.

@Matistjati
Copy link
Contributor Author

Matistjati commented Aug 20, 2025

I think a hard coded list of test cases like this is likely a bit too simplistic to yield good results though, but it's a simple enough addition to add, and if it helps catch one or a few bugs, it's worth it.

You'd be surprised at the amount of new authors who write output validators that will blow up if you sneeze on them. I've seen this at least 5 times.

@gkreitz gkreitz merged commit 81844c4 into Kattis:master Aug 20, 2025
5 checks passed
@Matistjati Matistjati deleted the fuzz-output-validators branch August 20, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants