-
Notifications
You must be signed in to change notification settings - Fork 22
Add uniqueness restriction on names #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
niemela
wants to merge
1
commit into
master
Choose a base branch
from
stricter-naming
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be a tiny bit more specific with the definition of equivalence? Surely we don't care about output validator flags in this definition, right?
For any two test cases, if the contents of their .in and .files directory are equivalent, as well as the args sequence in the .yaml file, then the input of the two test cases is equivalent. For any two test cases, if their input, output validator arguments and the contents of their .ans files are equivalent, then the test cases are equivalent.At the very least, we should say "if their inputs are equivalent". Additionally, we should probably either copy paste the definition or link to it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I want to be able to reuse a
.infile with different output validator flags.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe we do, but then we have two different kinds of equivalence. The one you want to use is "the inputs are equivalent", the one we already have defined and that I used is "the test cases are equivalent". The latter allows a judge system to reuse the results of judging the test case, the former does not. This is why I would like to use that definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For a concrete example: Sweden has a problem asking "find the min and max possible thing for a given input", with subtask "you only need to find the max correctly". I would argue that the most correct solution in this instance is that this property is part of the group via
output_validator_flags, not any testcase itself, and we want to be able to reuse them (problem in question: https://po2punkt0.kattis.com/problems/robottavlingen)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I see you have not read the entirety of me and @thorehusfeldt's discussion in #523 😏.
In that discussion the consensus seems to go towards
output_validator_flagsbeing part of "the test case". I think @thorehusfeldt's is arguing from a point of "the sameness of a test case should imply the sameness of the judgement of said test case", and I would agree with that. I feels strange to say that you could pass a test case and then fail "the same" test case? They are quite obviously not the same then.So, IMO, what you are talking about is identical input, not identical test cases. I would argue that that can be sufficiently handled by symlinks of copying?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit late to the party but I have to agree with @Matistjati and @Tagl. I would go even further and would say that current proposal sounds pretty useless and does not fit the workflow used in the past. Let me explain why:
This is the exact part that makes this proposal useless. If the judgement is the same is there even a need for the "same test case" in the first place?
I would also argue that this statement is fundamentally broken, since the judgement heavily depends on the submission which is not necessarily deterministic?
Even with your definition that could happen for non deterministic submissions?
Yes that is what I would be talking about since identical inputs actually appear, but this proposal would force me to name the link differently than the file that it points to? IMO this is very bad design.
Suppose you have multiple test groups (
A,B,C, ...) that should all include the fileeasy/1.in. Then you would end up with these weird symlinks:A/1a.in->easy/1.inB/1b.in->easy/1.inC/1c.in->easy/1.inIn the past we could just all call them
1.inwhich made clear that they are the same in file.I would strongly advice against this. I always considered all files with the same base name to be part of the test case and nothing else. So if the
output_validator_flagslive in<test case>.yamlthey are part of the test case but if they come from thetest_group.yamlthey are not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we are talking past each other here. I'm trying to define "same" here, how is "is there even a need for ..." relevant, and what do you mean by it?
I would argue that there are 2 reasonable definitions of "same" (or maybe we should use the term "identical", but I digress) we could use:
I am arguing against a definition that does not include some information that affects judging. Meaning that we would say that two test cases are the "same", while not expecting them the be judged the same (even if assuming deterministic submissions). I don't think you want that either, so I think we agree on this part? @mzuenni?
The problem format explicitly allows systems to assume determinism though. So, I strongly disagree that the statement is "fundamentally broken", maybe a bit sloppy, there should be an added "...assuming deterministic submissions".
Yes, but as stated above, systems may assume determinism, and also, is that a problem?
Only if you want identical input and different other settings. Is that what you want?
Why?
(I think you meant to write A/B/C and not A/A/A above, right? I fixed it above.)
This is only the case if you want some other parts of the test case to be different. Why do you want/need that? (That's not a rhetorical question, I'm not implying that you couldn't reasonably want that, I know of some reasons to want it, I'm wondering what use cases you have in mind.)
Well, currently, it does not make it clear that they are the same file, there is no such requirement. You could have a file named
D/1.inwith completely different contents, so it's not safe at all to assume that files names the same are the same.The suggestion here would be to make such a requirement, so that you could make such assumptions. I.e. the same (local) name to imply "same" test case.
Wow... this last part is IMO crazy!. You are saying that
output_validator_flagsis sometimes part of the test case and sometimes not?!? Why would you possibly want that definition?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did not know that but that makes things even worse. IMO the judging system should not be allowed to do that. That not only easily breaks assumptions of problems setters but also of participants?
If the judging system decides to only run one instance of this test case this is bad... I obviously wanted the test to be run multiple times. Why else should it be there?
Yes I know that there is no such requirement but that is not the issue I wanted to point out. My problem is very much the other way around: with this proposal I would need to make the name of a symlink different to the name of the file it points to?
Yes and No. What I want to say is that if we make file names unique then the name of a test case should only refer to files associated with that name? (so
<test case>refers to all files that have the form<test case>.<ext>)If we want some kind of uniqueness constraints for names of test cases I would want the following meaning "If two files have the same base name + extension they should have the same content".
If that is not what you want I would argue in favor of not adding such a restriction at all.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more point regarding this: This was not true for the legacy format, right? So if I upload a problem as legacy it cannot be cached but with the new format it can... That is terrible for every user group...
I even remember BAPC problems where the input was randomly generated... so this would no longer be possible at all??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also broken in a different way: it breaks every single randomized submission. The judge is basically allowed to rerun a submission an infinite number of times and check that it fails 0 times (so basically until it fails). Clearly that is not what we want right?
We very specifically mean: run this test case, and run it exactly once.
So I guess the discussion here is: can we change it to at most once for 'identical' testcases (for some definition of identical).
I would suggest that every
data/**/*.incorresponds to exactly 1 run, as it I have always understood it.If you want to avoid this, use
require_pass: easier_groupinstead to be explicit.this sounds reasonable. Note that this does not imply the opposite: if two files have the same content, they should have the same base name.