Add a consistency check for the precision of gr_float#279
Conversation
|
@hsinhaoHHuang, thank you for making this PR! And for providing such a detailed description! This looks fantastic! I'll circle back and give this a more careful review in the next day (but I doubt I will have much to add). Somewhere along the lines, we are going to need to document the |
Co-authored-by: Matthew Abruzzo <matthewabruzzo@gmail.com>
|
@mabruzzo Thank you for your comments. I have just tried to add descriptions in the documentation. Please let me know if I still need to add anything. |
mabruzzo
left a comment
There was a problem hiding this comment.
Thanks for making those changes! This morning, I had a moment to take a careful look at the PR. Again, it looks fantastic! Thank you again for making the PR.
I have 2 sets of important (largely superficial) suggestions. You should obviously feel free to push back if you disagree:
- I suggested a slight change to the description of
gr_check_consistency - In the code examples, I think we should be calling
gr_check_consistencybefore we touch any part of Grackle's API (i.e. before we modify thegrackle_verboseor declare themy_unitsvariable).- ASIDE: As I explain in a comment, you are going to encounter a merge-conflict with
cxx_grid_example.C(the comment provides a suggestion on how to resolve that conflict).
- ASIDE: As I explain in a comment, you are going to encounter a merge-conflict with
Most of the remaining suggestions are totally optional. They mostly pertain to the fact that I think we should start encouraging applications to only compare return-codes against GR_SUCCESS, and to avoid comparing with GR_FAIL (aka 0). As I note in the comments:
- this is necessary for one potential approach (i.e. associating specialized return-codes with detailed error-messages) that might be used to improve Grackle's error-reporting
- with that said, this isn't a formal policy (there are other approaches to improve error-reporting, that we may prefer). Importantly, we would need to do a bunch of work to gracefully convert (i.e. in a backwards compatible manner) most of Grackle's existing API functions to this sort of approach. Thus, it doesn't really matter which approach
gr_check_consistencyuses (converting it later would not involve much more effort)
reference of the Grackle
|
One last thought:
Should we be aborting the program? HDF5 has the added incentive that they really want to avoid corruption of existing files. I think the answer is "no" for us. But, I am open to the idea of eagerly aborting. @brittonsmith do you have any thoughts? |
|
@mabruzzo Thank you for the thorough review. I will modify the code based on your suggestions in a few days. |
This PR shifts the core-library tests to a separate "job." It has come to my attention in recent PRs (e.g. grackle-project#273, grackle-project#279), that our current choice to run the core-library test-suite after the Pygrackle test-suite is somewhat undesirable. Specifically, the core-library test-suite is more likely to include unit-tests (while there may not be many unit-tests yet, many forthcoming PRs have proposed adding them). This is important for 2 reasons: 1. Unit-tests are inherently faster than answer-tests. We should run these as soon as possible so we can fail quickly. 2. Unit-tests are also more specialized than answer-tests. If a unit-test fails, it is likely that an answer-test will also fail. The way things have been configured (before this PR) will cause the answer-test to fail and CI to abort before the unit-tests can be reun. As I write this up, I realize that I probably just could have re-ordered the tests. I'm totally willing to do that. But, I think in some ways this may be better (if we aren't too worried about using up resources)
This PR shifts the core-library tests to a separate "job." It has come to my attention in recent PRs (e.g. grackle-project#273, grackle-project#279), that our current choice to run the core-library test-suite after the Pygrackle test-suite is somewhat undesirable. Specifically, the core-library test-suite is more likely to include unit-tests (while there may not be many unit-tests yet, many forthcoming PRs have proposed adding them). This is important for 2 reasons: 1. Unit-tests are inherently faster than answer-tests. We should run these as soon as possible so we can fail quickly. 2. Unit-tests are also more specialized than answer-tests. If a unit-test fails, it is likely that an answer-test will also fail. The way things have been configured (before this PR) will cause the answer-test to fail and CI to abort before the unit-tests can be reun. As I write this up, I realize that I probably just could have re-ordered the tests. I'm totally willing to do that. But, I think in some ways this may be better (if we aren't too worried about using up resources)
Co-authored-by: Matthew Abruzzo <matthewabruzzo@gmail.com>
…o ValidateFloatPrecision
|
@mabruzzo Thank you again for the review and suggestions. I agree with your comments and have applied your suggestions. Regarding program abortion during consistency checks, here are some of my thoughts: In the future, our consistency check process may identify two categories of inconsistencies:
For non-critical inconsistencies, maybe we can print a warning message (or other preferred error reporting methods) and still return |
mabruzzo
left a comment
There was a problem hiding this comment.
Thanks for making the changes! LGTM
(I think we are going to leave it to the user to decide whether we abort)
Issue Description
I installed two Grackle (via building with classic build system on Linux) separately,
one with single precision (
make precision-32) and the other with double precision (make precision-64),at
/path/to/grackle_install_single/and/path/to/grackle_install_double/, respectively.Then, I called the Grackle library in a simple C++ code:
my_grackle_example.cpp
I compiled the code by linking to the installed double-precision Grackle
If I also link to the double-precision Grackle library before the execution via
LD_LIBRARY_PATH, there will be no problem:But if I run it by linking to the single-precision Grackle library instead:
I will get the wrong result, probably due to the inconsistent floating-point precisions of
gr_floatduring the compilation and the runtime of my program:Goal
To avoid the above issue, we may need to check the consistency of Grackle's floating-point precisions between the linked libraries during compilation and runtime.
Previous Discussion
During the discussion on Slack, Matthew Abruzzo kindly provided the following solution for me, which is the main idea of this pull request:
quotes from Slack
Changes made in this PR
gr_check_consistency()to check whether the precisions of the linked Grackle libraries are consistent.Tests for the changes
In the following,
"Compile with double" means compiling with
-L/path/to/grackle_install_double/lib, while"Run with single" means
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/grackle_install_single/libbefore execution.my_grackle_example.cppprovided abovesrc/example/cxx_example.CCompile with double and run with double
results
Compile with double and run with single, before modifications
results
Compile with double and run with single, after modifications
It successfully detected the consistency problem.