For context, I've had at least one install fail due to a submission getting TLE on Kattis' machine, but barely accepted on mine
ERROR PAC submission gemini2.cpp (C++) got TLE [testcase: testcase secret/group3/015-g1-smart, CPU: 6.90s @ testcase secret/group1/015-g1-smart]
I suspect that this issue is even worse for pass-fail problems (in legacy, partially_accepted never errors as long as the submissions in get accepted). I claim that submissions installs failing midway through due to submissions being sensitive to resource limits is highly undesirable.
The spec reads as follows:
Every submission matched by the glob pattern must satisfy:
- all test cases must have only verdicts present in permitted;
- at least one test case must have a verdict in required;
...
Some different ways of avoiding this problem:
1: Just don't add such submissions to be judged
Well, this feels bad, cause it's better to know exactly what ends up happening on the judge system.
2: Just hope for the best
IMO, a bad idea. Can also easily break forward compatibility (although trivially fixable).
3: Create a folder time_limit_sensitive_rejected or similar
Pretty much isomorphic to 1, but arguably better, since the tools can then still give info. HOWEVER: I claim that this option is actually terrible. There's a risk that some groups of problemsetters will start using something like
partially_accepted:
permitted: [AC, RTE, TLE, WA]
required: [AC, RTE, TLE, WA]
In that case, they have shot themselves in the foot hard enough to lose all warnings/errors for wrong verdicts. This anti-pattern can of course be warned about at the tooling level.
4: solve it at the tooling level
For example, intelligently decide what should be a warning vs error. I don't love this, since then my problem might install on some tools, but not others.
5: Explicitly specify what should be a warning vs error
One sensible way to spec it would be: Error if:
- AC solution does not get AC on all test cases
score is violated
message is violated
And warning for most other things.
6: specify that errors should change to warnings for resource-sensitive submissions
I hate this. Places unreasonable burden on tooling to detect "resource-sensitive submission".
Have I missed something? Otherwise, I would consider option 5 to be the most sensible one (modulo exactly what we spec).
For context, I've had at least one install fail due to a submission getting TLE on Kattis' machine, but barely accepted on mine
I suspect that this issue is even worse for pass-fail problems (in legacy,
partially_acceptednever errors as long as the submissions in get accepted). I claim that submissions installs failing midway through due to submissions being sensitive to resource limits is highly undesirable.The spec reads as follows:
Some different ways of avoiding this problem:
1: Just don't add such submissions to be judged
Well, this feels bad, cause it's better to know exactly what ends up happening on the judge system.
2: Just hope for the best
IMO, a bad idea. Can also easily break forward compatibility (although trivially fixable).
3: Create a folder
time_limit_sensitive_rejectedor similarPretty much isomorphic to 1, but arguably better, since the tools can then still give info. HOWEVER: I claim that this option is actually terrible. There's a risk that some groups of problemsetters will start using something like
In that case, they have shot themselves in the foot hard enough to lose all warnings/errors for wrong verdicts. This anti-pattern can of course be warned about at the tooling level.
4: solve it at the tooling level
For example, intelligently decide what should be a warning vs error. I don't love this, since then my problem might install on some tools, but not others.
5: Explicitly specify what should be a warning vs error
One sensible way to spec it would be: Error if:
scoreis violatedmessageis violatedAnd warning for most other things.
6: specify that errors should change to warnings for resource-sensitive submissions
I hate this. Places unreasonable burden on tooling to detect "resource-sensitive submission".
Have I missed something? Otherwise, I would consider option 5 to be the most sensible one (modulo exactly what we spec).