Our `correctness()` function is somewhat ad hoc, need to check how this things are usually scored
Our
correctness()function is somewhat ad hoc, need to check how this things are usually scored