CI: shift core-library tests to a separate "job"#283
Conversation
This PR shifts the core-library tests to a separate "job." It has come to my attention in recent PRs (e.g. grackle-project#273, grackle-project#279), that our current choice to run the core-library test-suite after the Pygrackle test-suite is somewhat undesirable. Specifically, the core-library test-suite is more likely to include unit-tests (while there may not be many unit-tests yet, many forthcoming PRs have proposed adding them). This is important for 2 reasons: 1. Unit-tests are inherently faster than answer-tests. We should run these as soon as possible so we can fail quickly. 2. Unit-tests are also more specialized than answer-tests. If a unit-test fails, it is likely that an answer-test will also fail. The way things have been configured (before this PR) will cause the answer-test to fail and CI to abort before the unit-tests can be reun. As I write this up, I realize that I probably just could have re-ordered the tests. I'm totally willing to do that. But, I think in some ways this may be better (if we aren't too worried about using up resources)
1d20c17 to
4693736
Compare
…ion flags seem to make a difference.
I realized that the running the code on MacOS machine with CMAKE_BUILD_TYPE=Release and CMAKE_BUILD_TYPE=Debug produces slightly different answers. The flags used in these cases are: - Release: -O3 - Debug: -g This is fairly disturbing since it should not make a difference (-O3 does not enable any floating point optimizations). While I haven't tested it, I'm **extremely** confident that this isn't a CMake issue. But I'm not really sure what we can do... Let's keep track of this
brittonsmith
left a comment
There was a problem hiding this comment.
This looks good. I had one minor comment. Feel free to deal with that and merge. Only slightly related to this, one thing I don't like is that we run tests with the classic build system in generate mode prior to creating the gold standard. It probably had marginal benefit, but I think it would make sense for gold standard generation to happen first and for as much of the testing as is possible to run in compare mode.
Co-authored-by: Britton Smith <brittonsmith@gmail.com>
I 100% agree. I've been tempted to change this a couple of times. But, I was previously under the false impression that you had picked the current order (I now suspect that the current order originated from a typo OR the fact that you and I were making large changes to the answer testing framework around the same time as each other) |
|
I also suspect this was simply a typo. |
|
As we discussed offline, I adjusted the continuous integration so that now run tests with the classic build system in compare-mode AFTER we record the test-answers. While I doing this I made 3 really minor tweaks:
More importantly, I caught a bug where we weren't properly running the gold-standard in compare mode (this was due to a typo on my part in a previous PR).
|
This PR shifts the core-library tests to a separate "job."
It has come to my attention in recent PRs (e.g. #273, #279), that our current choice to run the core-library test-suite after the Pygrackle test-suite is somewhat undesirable. Specifically, the core-library test-suite is more likely to include unit-tests (while there may not be many unit-tests yet, many forthcoming PRs have proposed adding them).
This is important for 2 reasons:
As I write this up, I realize that I probably just could have re-ordered the tests. I'm totally willing to do that. But, I think in some ways this may be better (if we aren't too worried about using up resources)