Skip to content

Tweaking code example tests#258

Merged
mabruzzo merged 9 commits into
grackle-project:mainfrom
mabruzzo:tweaking-code-example-tests
Mar 26, 2025
Merged

Tweaking code example tests#258
mabruzzo merged 9 commits into
grackle-project:mainfrom
mabruzzo:tweaking-code-example-tests

Conversation

@mabruzzo

@mabruzzo mabruzzo commented Feb 28, 2025

Copy link
Copy Markdown
Collaborator

This depends on #254


Overview

The purpose of this PR is to move the code-example tests out of the pytest test-suite into the test suite associated with the closely associated with the core library. Additionally, I

  • reordered code in the Fortran example to more closely resemble the C example1
  • I fixed a MAJOR, crippling bug in the Fortran bindings

Note

The collapsible sections in this PR mostly s mostly here for the sake of completeness and clarity. This PR has been written in a way that you should be able to skip over them.

Describing "the problem" with these tests

In my experience, the code_examples tests have always been an extremely flaky part of Grackle’s test suite! (Way back when I first started contributing to Grackle, it was a major struggle to get them working)

These tests consist of compiling our-code examples, running the examples, and then comparing the outputs to reference results.

I provide a detailed describe of the tests' issues in the following collapsed section. The main highlights are summarized at the start of the next subsection

Detailed description of issues with easily running these tests

Detailed description of the tests' issues

Compiling anything is obviously complex. In order to make these tests work as “automatically as possible,” the tests have made a LOT of assumptions. These assumptions include:

  • The build artifacts haven’t been modified or cleaned since we built pygrackle. Specifically, we need to find the auto-generated headers at compile-time and the copy of the shared library for linking 
  • the Grackle library is fully installed in a location that the system can locate at runtime (i.e. in a system directory library directory or in a directory specified by LD_LIBRARY_PATH). This may not seem like a problem since this was historically a requirement for running pygrackle (this isn't the case anymore). But, it makes development of Grackle extremely difficult without breaking your current installation (historically, I would alway delete my current installation of Grackle before I would start developing it).

If things aren’t set up “just so” on your system, the tests won’t run properly.

When I introduced the CMake build-system, these tests were a major pain. One of CMake’s killer features is its support for out-of-source builds.2 This is fundamentally at odds with the way these tests work. But I came up with a work-around that let us temporarily limp along.

This problem has been further exacerbated now that I shifted our python build-system from setuptools to scikit-build-core. This change made it extremely easy for somebody to install pygrackle:

  • If you have hdf5 and its headers installed, you just clone the grackle repository, and perform pip install . or pip install -e .. This performs an "embedded" build of the grackle library, which is packaged within the pygrackle module
  • 2 features of (modern) python packaging exacerbate the issues with these tests: (i) the idea of isolated build-environments and (ii) the fact that pip (and uv) perform builds and installation in one step
  • these issues would still be present even if we still used setuptools.
  • Currently, you simply cannot run the code-example tests if you install pygrackle in this way

Summary of the tests' issues

To summarize the issues with easily running these tests:

  1. They require you to have access to grackle's build-artifacts (the autogenerated headers are the biggest issue here)
  2. They currently require the copy of grackle being tested to be fully installed (which complicates a lot!)
  3. These issues are particularly problematic for the modern "easy way" of installing pygrackle. where the core grackle library (libgrackle.so) is automatically built as part of pygrackle (and is packaged into the pygrackle wheel) when you invoke pip install. or pip install -e .

I've given a lot of thoughts on how to solve the issue. At a glance, both issues seem solvable. As I detail in the following collapsed section, it doesn't make sense to solve the issue in this way. The common feature among all of these "solutions" is that they require that we bend over backwards (and the result will still be somewhat flaky)

Details

 

Flawed "Solutions"
To solve question 1: I originally asked "why not package the required headers as part of pygrackle?" The short answer for why we shouldn't do this is that this simply isn't done (e.g. h5py doesn't do that). More importantly, the presence of these headers may falsely suggest to users that applications could directly link against their code agains the copy of libgrackle.so packaged with pygrackle.3 This is something we NEED TO ACTIVELY DISCOURAGE!! I don't think that simply explaining to people that they shouldn't do this is enough (when there are better alternatives):

  • First, under this scenario, the code-examples tests would be doing exactly what we don't want users to do: compiling executables using the version of grackle headers packaged into pygrackle (and using the copy of libgrackle.so packaged into pygrackle, if present).
  • The deeper issue is that linking applications against the contents of pygrackle will actually work sometimes. This depends on:
    • whether a copy of libgrackle.so is actually packaged with pygrackle. While the easy/simple way of installing pygrackle does this, we will almost certainly want to maintain support for linking pygrackle against an external of grackle.
    • Our code examples would work because our code-examples have minimal dependencies and we have full control. Lot's of simulation codes directly link against hdf5 or OpenMP-runtime-libraries or Fortran-runtime. It's possible for the copy of libgrackle.so packaged in pygrackle to be compiled/linked agains version of shared dependencies that conflict with the versions used by the application.4 If we ever distribute binary wheels of pygrackle (which will include copies of these runtime libraries), these conflicts would be unavoidable

To solve question 2: I thought about using --rpath to avoid requiring a global installation. The issue with this solution is that it is not portable. --rpath is not a standard flag across compilers/linkers (most toolchains provide similar functionality). Plus, the behavior differs across platforms. For example, there are some difference on macOS and on Linux (I think from differences between MACH-O and ELF). Abstracting over this machinery is possible (CMake has support for doing this!), but I don't think it's worth our time to do that.

An alternative way to solve both question involves having CMake directly build the code examples and having the pytest test-suite directly invoke those tests. You can already build the code-examples this way (SPOILER: this has some resemblance to the final solution).

  • in general, this would require us to pass an argument to pytest telling it where it can find the build-directory. We would also need to make sure that the code-examples are built ahead of time.
  • we could accomplish this for the easy pip install approach using some scikit-build-core features (but it requires that user passes some special flags while installing pygrackle). I actually had a discussion with the scikit-build-core developers about how I could wire this all up in a fairly seamless manner... While I'm confident my plan would work, the discussion have helped me conclude that there is a better solution.

Motivating The Solution

It's instructive to step back and consider how this repository is essentially a monorepo that provides source code for 2 distinct entities:

  1. the core grackle library (libgrackle)5
  2. the pygrackle python package

To drive home this point, consider that this pretty unusual for the large community-maintained open-source scientific software packages (with python interfaces) that we typically encounter as astronomers:

  • Usually a python package whose purpose is to wrap a C library exist in separate repositories (think hdf5/h5py OR mpi implementations and mpi4py). While there are plenty of python package repositories that include C/C++/Fortran code (e.g. numpy, scipy, yt), the python package is the main "product" (the compiled code isn't separately distributed)5
  • The facts that (i) we explicitly maintain version numbers for core library (libgrackle and pygrackle and that (ii) grackle users can install libgrackle without pygrackle is indicative of this distinction.

The distinction between the core library and pygrackle is also reflected by the source code separation, the fact that pygrackle consumes grackle's public API (and has no special access to internal functions), and there is a clear distinction in the build-systems.6

The point of this drawn-out discussion is to emphasize that having distinct test-suites for both entities is sensible. While the majority could belong to either suite, a subset of tests must to one suite or the other. Obviously, tests of pygrackle's python-logic must go into pygrackle's test-suite (the pytest test-suite). Likewise, unit-tests of internal core logic or parts of grackle's API that aren't covered wrapped by pygrackle need to go into the core-library's test-suite. (We have already started to respect to account for this by introducing unit-tests written with googletest, but we never actually hashed any of this out)

We resolve the "problems" with these tests by transferring the code-example tests to the core library's test suite.

The Solution

In more detail:

  • When the CMake build-system is configured with -DGRACKLE_BUILD_TESTS=ON, the code-examples (and all other test-code) are all built alongside libgrackle. CMake is smart enough to link all of the code-examples against the libgrackle library in the build-directory in a way that they can be run without fully installing libgrackle (it leverages machinery related to --rpath under the hood)
  • These tests are actually driven by the ctest program that is shipped as part of CMake.
    • the CTest test-driver is integrated with CMake and it is designed to flexibly run various kinds of tests by invoking the command-line
    • ASIDE: the existing googletest unit-tests integrate nicely with CTest (CMake provides nice built-in functionality to automatically detect each unit-test written with that framework and support running them as distinct test-cases)
  • for cxx_omp_example, we just compile and run the program (which is what we currently do)
  • the logic for testing the code-examples is largely encapsulated by a python script7 that executes the specified code-executable and compares the printed result against expectations (consistent with what we already)

There is an important detail I have largely ignored until now: these tests (other than cxx_omp_example) are currently answer-tests because they don't have an obvious "correct answer". My solution is as follows:

  • We have 4 code-examples here. Since each example was designed to perform exactly the same operation (just with a different API or using a different language). Consequently, we can check correctness for 3 of the examples by ensuring that they produce exactly the same result as the 4th example.
  • For the 4th example, we just store a json-file within the Grackle repository that corresponds with the expected result. I think this is acceptable since there are only 5 numbers we care about and adopt some relatively loose tolerances for this case. In my mind, all we really care about here is ensuring that the code-examples continue to provide reasonable results.
  • If you really want the "correct answer" of these examples to be stored in the answer-tests, I have an idea for how we can do it, but it's going to take a fair amount of work.

To actually achieve this outcome, I needed to slightly modify some of the code-examples. All I did for c_local_example, c_example, and cxx_example was adjust the precision used to print out the result.

fortran_example needed more work

  • the inputs were slightly different
  • to remove discrepancies in the results, I reordered the operations of fortran_example so that the logical structure more closely resembled c_example

I ultimately realized that the annotation of the dt argument in Fortran's interface declaration of solve_chemistry(units, fields, dt) was wrong.

  • When you would call this function from Fortran, the Fortran compiler would pass the pointer to the specified dt value into the C function.
  • the C function always assumes that dt argument is a double passed by value. Consequently, the C function would essentially treat the pointer address specified by Fortran as a double (it didn't know it had to dereference it).
  • In other words, all Fortran calls to this function would always evolve chemistry over some random/arbitrary timestep

Some discrepancies still persist between fortran_example and the other examples, but the discrepancy is a relative error has a magnitude of ~6e-7. I think this acceptable for the moment (to fix this, I think we would need to remove the dynamic calculation of all input values and instead we would need to hardcode all inputs with literals)

Important

This solution doesn't actually support running the code-examples when you use the easy-install mode of pygrackle (it is definitely possible, if you pass a bunch of extra flags). The main point is that there is no longer an expectation that the code-example tests should run this way.

Instead, the code-example tests are now run alongside the rest of the core-library tests:

~/ $ cd grackle
~/grackle $ cmake -DGRACKLE_BUILD_TESTS=ON -GNinja -Bbuild
~/grackle $ cmake --build build && cd build && ctest

Other thoughts

I realize that we need documentation for the core-library test-suite. That's on my todo list

If you really don't want us to use ctest, I could move all the code logic into googletest and invoke them with the popen posix system call. There are 2 reasons why we may not want to do this:

  • This is somewhat irregular and would probably be a little surprising to people familiar with googletest (these tests are effectively integration-tests and the googletest explicitly exists to support unit-testing).
  • We may need to take some special care when configuring googletest and using popen (to avoid breaking things). It's definitely possible (Cholla does it), but it doesn't seem like a great idea8
  • Furthermore, I have some partially written tests to automatically confirm that the various documented integration strategies (and example snippets) don't get broken. These tests are going to need to basically run entire fresh builds of grackle. Therefore, they will need to be managed by ctest or they will need to exist entirely outside of googletest and pytest (Issue Add tests for linking #249)

Footnotes

  1. I suspect that the C and Fortran examples originally had much more similar structure. Then over the years, as the structure of the C example improved, the  Fortran example wasn’t updated. (I suspect this was at least partially my fault)

  2. Out-of-source builds are extremely useful. It’s also the reason CMake projects (like Grackle) are able to support automatic dependency management. Creating multiple different build directories can radically speed up development of large projects (if you maintain a different directory for each branch you are working on, rebuilds will be much faster). It also allows easier IDE support.

  3. I originally thought that this could be a really easy, simple, attractive way for us to distribute pre-compiled copies of grackle. It seems like a fantastic idea before you understand the full picture.

  4. Imagine the application that uses parallel HDF5 while pygrackle got compiled with serial HDF5. Or that different fortran compilers got used with incompatible Fortran Runtime libraries. Or that compilers with incompatible OpenMP runtimes are used. These incompatibilities don't actually create problems when different python extension modules are compiled with incompatible dependencies because python internally uses dlopen.

  5. A notable exception to all of this is something like pytorch. It has an underlying C++ library, libtorch, that you can use without the rest of pytorch (my impression is that libtorch is much less feature-rich than pytorch). 2

  6. Aside: the scikit-build-core is explicitly designed to support cases where the line between the python extensions-modules and a C library are much more blurry and are much more tightly coupled. When I adopted scikit-build-core, I made sure to maintain the clear division between pygrackle and grackle

  7. Note: this didn't have to be a python. It could theoretically be any portable scripting language (bash/awk) or I could have written a C/C++ program (that CMake would compile) in order to run this test for us.

  8. In the future, I actually want to migrate the logic of the test_chemistry_struct_synched.py test into googletest (doing this will let us write a bunch of useful new tests that require access to the internal details of grackle). Doing that will require us to use popen. (so this undermines my point to some degree)

Previously, the logic invoking core-lib tests (i.e. through `ctest`) was
implement as a bunch of shell commands directly embedded into a
"run-step" of the test-suite job.

This commit relocates the logic to a named command called
"run-core-tests." This command contains logic for running the
cxx_omp_example example (while this commit simply duplicates the logic,
the next commit will rectify the duplication)
I renamed the "run-tests" command to be called "run-pygrackle-tests". I
also removed the logic for running the OpenMP example from this command

For context, the prior commit configured the "run-core-tests" command
to have the logic for running the OpenMP example. But, to make changes
more atomic, that commit didn't remove the logic from the command now
known "run-pygrackle-tests"
@mabruzzo mabruzzo added the testing test suite, regression tests, ci infrastructure label Feb 28, 2025
@mabruzzo mabruzzo added the bug Something isn't working label Mar 12, 2025
@mabruzzo mabruzzo modified the milestones: 3.4, 3.5 Mar 12, 2025

@brittonsmith brittonsmith left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. If you could create an issue to make documentation for how to run these tests locally, that would be great. Otherwise, merge when ready.

@mabruzzo mabruzzo merged commit bfcec78 into grackle-project:main Mar 26, 2025
@mabruzzo mabruzzo deleted the tweaking-code-example-tests branch March 26, 2025 01:10
mabruzzo added a commit to mabruzzo/grackle that referenced this pull request Apr 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working testing test suite, regression tests, ci infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants