Conversation
gkreitz
left a comment
There was a problem hiding this comment.
Looks like a good feature to add. One problem with the structure on how you run it (I don't get why it's tied to sample cases, but perhaps I'm missing something).
Here are a couple of more cases I think we want to throw in (either in this PR, or a follow-up):
NaNinf- a null byte
- invalid UTF-8
I think a hard coded list of test cases like this is likely a bit too simplistic to yield good results though, but it's a simple enough addition to add, and if it helps catch one or a few bugs, it's worth it.
problemtools/verifyproblem.py
Outdated
| # Note that these might be valid output, so we only check if it crashes | ||
| sample_cases = [tc for tc in self.problem.testdata.get_all_testcases() if tc.is_in_sample_group()] | ||
| for desc, junk_case_content in _JUNK_CASES_CRASH: | ||
| run_junk_case(desc, junk_case_content, sample_cases) |
There was a problem hiding this comment.
If I parse the code correctly, it looks to me like you'll run the fuzzing cases the same number of times as there are sample cases? This feels wasteful if there are multiple sample cases, and bad if there are none (then these tests won't be run).
Don't we always just want to run them once (e.g., just grab the first test case, instead of all samples).
There was a problem hiding this comment.
You're correct in your reading.
I was kinda hoping that by checking multiple samples, we increase the probability that the case will be semi-valid output for said sample and get further in the validator. I'm still inclined to arbitrarily take the first 3, for an overhead of 120ms when using C++. When using Python, this increases to 2.5 seconds, so I changed it to take only one testcase if you're using Python, resulting in ~800ms.
You'd be surprised at the amount of new authors who write output validators that will blow up if you sneeze on them. I've seen this at least 5 times. |
I suspect that many output validators out there are not crash-proof against all contestant output. I think that adding a couple of evil testcases might catch some poorly written ones.
I decided to only run it on samples, since performance actually becomes a concern if you have lots of testcases (especially if they don't deduplicate symlinks. Not sure if we do this, will have to check in the future).
For the contents of the cases, I just tried to think of ways I to incorrectly write some existing validators. Perhaps @niemela or @gkreitz could look into judge errors on Kattis and add a minimal version of what killed the output validator?