[newchem-cpp] transcribe lookup_cool_rates0d 1/2#378
Conversation
… for transcription
…now use the standard GrainSpeciesCollection type.
…teCollection type.
This includes all collisional and recombination rates the behave "normally". We now use the standard ColRecRxnRateCollection type. This involved a lot of search-and-replace
…N_NAME with wrappers
…with calls to C++ wrappers
…' into gen2024transcribe/lookup_cool_rates0d
|
Ugh, ... The tests (freefall metal chemistry) are failing when I build this locally on macOS. I have confirmed that the bug is introduced in one of the first 5 commits of this PR... (it is NOT an issue in #375). Unfortunately, the fact that we merged in both
AFTER I wrote the original PR is making this EXTREMELY annoying to debug. I'm starting to think that the only way to solve this is to redo the transcription of |
|
@brittonsmith you should move ahead with your review of this PR, and ignore the failing tests on macOS. I'll expand more on this down below. Looking into the failing tests on macOS has been a giant PITA. The bottom line is there is some kind of undefined behavior. I strongly suspect we're using an uninitialized variable, since that has been a common issue that we've dealt with multiple times with Gen's branch. Evidence that the code has undefined behavior:
To summarize: the undefined behavior triggering the failing test is present in the Fortran logic.3 Since the tests are all passing in CI, I think we should move ahead with this PR, and maybe bump the gold-standard after this PR is merged (I'm fairly confident that the C++ transcription "preserves logic"). We obviously need to track down this undefined behavior, but we have a far better chance of doing that in C++ than in Fortran (there are far fewer variables...). With that said, the "right answer" is probably to try running valgrind...4 Footnotes
|
|
@brittonsmith, I ran src/python/tests/test_models.py::test_model[freefall-metal_dust_chemistry_variants-2-0] under valgrind (with memcheck) today. I was using my debug build of cpython & a debug build of grackle (the latest version of the newchem-cpp branch1). It takes 2 hours 10 minutes to run this single test under valgrind. And... there valgrind doesn't identify any errors. It's worth noting that I was running valgrind on linux even though I first encountered the issues on macOS. That's because valgrind isn't compatible with ARM-based Macs. So, I think this means we should move forward and create gold-standard-nccv5 after this PR is merged. This is for a few reasons:
Footnotes |
brittonsmith
left a comment
There was a problem hiding this comment.
I've looked at the differences for the failing tests using the --model-comparison-dir option. Generally speaking, the differences are very small. The freefall model we uses has a variable timestep that accounts for the effective equation of state. Significant differences will result in a different number of timesteps making it impossible to measure a relative difference (we're not interpolating). Therefore, it is very encouraging that this only happens in a few cases and even those I cannot see a difference in any of the plots with my eye.
In the rest of the cases, the differences are quite small. The majority of them (say 8/10 or so) have differences for species densities of the order ~1e-13 with an occasional jump to ~1e-9 for a couple species. In one case, the differences reach ~1e-3, but even then not high enough to result in different timestepping. In this case, the maximum relative difference in the temperature was 2e-3. I didn't see anything systematic. I'm happy to merge this and move on.
This PR was originally proposed as brittonsmith#31
This PR must be reviewed after #375
This was the step 1 of 2 for transcribing
lookup_cool_rates0d.0or1). Essentially, we introduced a few local variables to temporarily store these values and then passed the variables as arguments (this overcomes a limitation of the transcription tool)FORTRAN_NAME(cool1d_multi_g),FORTRAN_NAME(lookup_cool_rates1d_g),FORTRAN_NAME(rate_timestep_g)with the C++ wrappers (that use the reduced arg lists)grackle_field_data, this needed to change:grackle_field_data(distinct from the instance passed to Grackle's API).step_rate_gand then heavily modified. That logic is more directly addressed in the next PRColRecRxnRateCollection,PhotoRxnRateCollection, andGrainSpeciesCollectiontypes since the C++ use these types to pass around large bundles of quantities (there was a bunch of search-and-replace done to avoid breaking logic at the end of the function)IndexRangeThe fact that the
cool1d_multi_g,lookup_cool_rates1d_g, andrate_timestep_groutines are only called from C++ source files AND the fact they are only called through the C++ wrappers (i.e.grackle::impl::fortran_wrapper::cool1d_multi_g,grackle::impl::fortran_wrapper::lookup_cool_rates1d_g, andgrackle::impl::rate_timestep_g) is an important milestone. It means that they can be replaced with transcribed function.The resulting code in this PR is still very messy. But, I choose to make this PR at this point because it was a point when @ChristopherBignamini could start working from this branch in order to transcribe
cool1d_multi_gin parallel with my efforts. In fact, it was originally my intention to break this particular PR into smaller (slightly more logical) pieces, but I prioritized "getting done" in order to stop being a bottleneck for @ChristopherBignaminiAnother PR has been posted to further cleanup
lookup_cool_rates0d.