Skip to content

CI: shift core-library tests to a separate "job"#283

Merged
mabruzzo merged 11 commits into
grackle-project:mainfrom
mabruzzo:ci-corelibtests-job
Apr 3, 2025
Merged

CI: shift core-library tests to a separate "job"#283
mabruzzo merged 11 commits into
grackle-project:mainfrom
mabruzzo:ci-corelibtests-job

Conversation

@mabruzzo

Copy link
Copy Markdown
Collaborator

This PR shifts the core-library tests to a separate "job."

It has come to my attention in recent PRs (e.g. #273, #279), that our current choice to run the core-library test-suite after the Pygrackle test-suite is somewhat undesirable. Specifically, the core-library test-suite is more likely to include unit-tests (while there may not be many unit-tests yet, many forthcoming PRs have proposed adding them).

This is important for 2 reasons:

  1. Unit-tests are inherently faster than answer-tests. We should run these as soon as possible so we can fail quickly.
  2. Unit-tests are also more specialized than answer-tests. If a unit-test fails, it is likely that an answer-test will also fail. The way things have been configured (before this PR) will cause the answer-test to fail and CI to abort before the unit-tests can be reun.

As I write this up, I realize that I probably just could have re-ordered the tests. I'm totally willing to do that. But, I think in some ways this may be better (if we aren't too worried about using up resources)

@mabruzzo mabruzzo added the ci Related to Continuous Integration label Mar 26, 2025
This PR shifts the core-library tests to a separate "job."

It has come to my attention in recent PRs (e.g. grackle-project#273, grackle-project#279), that our
current choice to run the core-library test-suite after the Pygrackle
test-suite is somewhat undesirable. Specifically, the core-library
test-suite is more likely to include unit-tests (while there may not
be many unit-tests yet, many forthcoming PRs have proposed adding them).

This is important for 2 reasons:
1. Unit-tests are inherently faster than answer-tests. We should run
   these as soon as possible so we can fail quickly.
2. Unit-tests are also more specialized than answer-tests. If a
   unit-test fails, it is likely that an answer-test will also fail. The
   way things have been configured (before this PR) will cause the
   answer-test to fail and CI to abort before the unit-tests can be
   reun.

As I write this up, I realize that I probably just could have re-ordered
the tests. I'm totally willing to do that. But, I think in some ways
this may be better (if we aren't too worried about using up resources)
@mabruzzo mabruzzo force-pushed the ci-corelibtests-job branch from 1d20c17 to 4693736 Compare March 26, 2025 04:26
@brittonsmith brittonsmith added this to the 3.4 milestone Mar 26, 2025
I realized that the running the code on MacOS machine with
CMAKE_BUILD_TYPE=Release and CMAKE_BUILD_TYPE=Debug produces slightly
different answers.

The flags used in these cases are:

  - Release: -O3
  - Debug: -g

This is fairly disturbing since it should not make a difference (-O3
does not enable any floating point optimizations). While I haven't
tested it, I'm **extremely** confident that this isn't a CMake issue.

But I'm not really sure what we can do... Let's keep track of this

@brittonsmith brittonsmith left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I had one minor comment. Feel free to deal with that and merge. Only slightly related to this, one thing I don't like is that we run tests with the classic build system in generate mode prior to creating the gold standard. It probably had marginal benefit, but I think it would make sense for gold standard generation to happen first and for as much of the testing as is possible to run in compare mode.

Comment thread tests/scripts/code_example_checker.py Outdated
Co-authored-by: Britton Smith <brittonsmith@gmail.com>
@mabruzzo

mabruzzo commented Apr 2, 2025

Copy link
Copy Markdown
Collaborator Author

Only slightly related to this, one thing I don't like is that we run tests with the classic build system in generate mode prior to creating the gold standard. It probably had marginal benefit, but I think it would make sense for gold standard generation to happen first and for as much of the testing as is possible to run in compare mode.

I 100% agree. I've been tempted to change this a couple of times. But, I was previously under the false impression that you had picked the current order (I now suspect that the current order originated from a typo OR the fact that you and I were making large changes to the answer testing framework around the same time as each other)

@brittonsmith

Copy link
Copy Markdown
Contributor

I also suspect this was simply a typo.

@mabruzzo

mabruzzo commented Apr 3, 2025

Copy link
Copy Markdown
Collaborator Author

As we discussed offline, I adjusted the continuous integration so that now run tests with the classic build system in compare-mode AFTER we record the test-answers.

While I doing this I made 3 really minor tweaks:

  • I fixed a typo in the name of one of the new CI "steps"
  • I made sure the CI will explicitly fail if we make a particular mistake when modify config.yml (we were already printing an error message for this case)

More importantly, I caught a bug where we weren't properly running the gold-standard in compare mode (this was due to a typo on my part in a previous PR).

  • While trying to fix that bug, I found and fixed a separate minor bug that was messing with string comparisons when we use circle-ci's parameters (as luck would have it, I actually think everything was previously working as intended). The fix was simple: add some quotes

@mabruzzo mabruzzo merged commit 0405d21 into grackle-project:main Apr 3, 2025
@mabruzzo mabruzzo deleted the ci-corelibtests-job branch April 3, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Related to Continuous Integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants