Reimplement support for Link Time Optimization#4666
Conversation
|
jenkins build this serial please |
b50f96c to
78931ff
Compare
|
I did not find how to install the new CMake file. Where do I have to add it? |
|
cmake files are globbed, or well, entire folder is installed https://github.com/OPM/opm-common/blob/master/CMakeLists.txt#L420 |
78931ff to
5df4bc4
Compare
|
@akva2 I just applied the requested changes plus an improved version of the documentation. |
5df4bc4 to
f86cdb6
Compare
akva2
left a comment
There was a problem hiding this comment.
This does the advertised job and looks good as such.
That being said, it's a bit unfortunate that the jobs option do not apply to the standard cmake support. Modifying CMAKE_${lang}_COMPILE_OPTIONS_IPO could work (ie I hacked it in and it did work), ie, replace -flto=auto with -flto=jobs for GNU and add -flto-jobs=jobs for clang so if you feel up to doing that cleanly... But I can merge as-is, it is still an improvement.
|
Where did you find As for the linker jobs, what I understand is that the linker needs to use something called a "server protocol" (no idea what exactly is that) so that it can coordinate with the build system and not oversubscribe the system. Otherwise, every linker job from the build system may try to use the whole system at the same time. As far as I know, As for the GNU incremental LTO, I just learned about it. I will have a look. If you could tell me how you used |
|
Admittedly by digging into internal details. You find the compiler support in /usr/share/cmake-xxx/Modules/Compiler/(GNU|Clang).cmake, where you can see the variables it sets up. I agree about the default to 1. The core of the problem is in fact that cmake defaults to auto (or 0 for clang/lld) which means use all you can find. Remedying that is really what I'm after, because as-is enabling LTO is kinda "dangerous" on, say, a laptop.. |
|
Ah ok, that's good to know. But we definitely cannot use this for setting up the LTO. I also didn't know that the default for GNU is |
|
Alright, just by reading the GNU documentation, one has to set up the archiver and the randomizer tool to be the gnu ones. Otherwise, static libraries will not contain the LTO data. Even on debian, they seem to be different from the one provided by the system. This is precisely what the "old" code implements. The problem is that these tools need to be changed for the whole cmake configuration, not target based. This could override some user or toolchain settings too, so is definitely not ideal. Even more, the old code says that those options did trigger some bugs in the linker plugins. I will try to get some sensible defaults here too, but I would just suggest to keep it disabled for GCC and enable it by default for Clang. |
|
yeah i know, i wrote it after all. as far as I see cmake handles that mess in its GNU-FindBinUtils.cmake |
|
Alright, thanks for looking. The GCC documentation also says that "Note that modern binutils provide plugin auto-load mechanism." That means that we could replicate the options set up by CMake, like ThinLTO but for GCC. My only question would be: do you know what this part of CMake with LTO does? Do we need to also include that too? I do not know that syntax nor for what it would be used for. Also, I just realized that ThinLTO is already enabled by default with cmake LTO. So what is missing in both cases is the parallelism and cache. |
9d2f03e to
019a031
Compare
|
@akva2 I just pushed the implementation for GCC LTO, including its incremental support which was added in GCC 15. I managed to test this in macOS with GCC 15 and Clang 17, which works flawlessly. However, I also tried it in a Debian machine with GCC 12, and the LTO insisted on discarding an "unused" function when archiving the library. Of course, executables or libraries that needed that function later on did not find anything. I am unsure how to proceed. It seems to be a linker issue, so I left AppleClang enabled by default (there is only one linker in macOS AFIK). But it would also be nice to know which versions of LD and GCC also work well together so that this can be enabled there too. |
|
jenkins build this serial please |
akva2
left a comment
There was a problem hiding this comment.
A few comments, but this looks really good now
019a031 to
5d29371
Compare
|
jenkins build this please |
|
This is ready from my side. I could not test the combinations that work well for Linux, so the default behavior is to disable LTO for cases other than AppleClang. As soon as I find the conditions where we can enable more cases by default I will try to submit that in another PR. |
I just built flow in my mac using clang with this PR and it seems all is good. However, this PR quite increases the compilation time (no free lunch as usual). Is the plan to add a cmake flag to turn this off/on? e.g., |
|
There already is an option -DOPM_INTERPROCEDURAL_OPTIMIZATION_TYPE=NONE |
|
@daavid00 is the increase in compilation time very substantial? If everything works correctly, there must be a cache that should reduce compilation times in subsequent modifications of the code on the same build. |
For building all opm-common with master takes ca 2 min 30 s, while with this PR takes 5 minutes. It is always good to have the option to turn features on/off which this PR already has, so looking forward to this being merged. |
|
Now I'm curious about compile time on Linux. I might check that. Please indicate in the decription that the default has changed and how to go back to it for user and manual writer convenience. |
|
@akva2 could you please push your changes to let this work on Linux? |
|
We are in the process but moving step by step as it needs some cleanups, you can use #4706 |
|
everything is now in master. we need to decide on the default. i think the easiest way to get this in is to make it off by default everywhere, and then change the default separately. |
|
Great, thank you! I agree, let's merge with default off and I later make some table with measurements with different linkers and compilers... then we decide if it's worth it for everyone. |
|
So, will you disable it for (apple)-clang as well, then I will hit the shiny green. Also I think the default mode output is just confusing, so maybe remove that as well, and replace it with the actual used mode? |
5439d24 to
a6588fe
Compare
|
I just pushed a new version that:
|
|
One small final request: opm-upscaling is a bit messy and enables the Fortran language. For reasons I don't care to investigate, the check_ipo_support fails in that case. This is a bit annoying if you use a 'super'-build, ie, https://github.com/OPM/opm-utilities/tree/master/opm-super since opm-upscaling is the last module built, resulting in the drop-down list only showing the 'NONE' entry for optimization type (and making LTO unavailable in opm-upscaling in general). The easy fix is to add LANGUAGES C CXX to the check_ipo_support call. |
|
Other than that this has full sign-off now, tested
The two failures are not really important, clang+bfd isn't really supported, gcc+mold fails to a known bug in my mold version. |
This implements LTO for specific configuration types like ThinLTO and incremental LTO from clang and gcc. It also removes old whole program optimization in favor of this target based mode.
a6588fe to
60b6e6d
Compare
Fair, I only tested this with C++ anyways, which I guess also extends to C. In any case, that is done. Thanks for the testing! |
|
jenkins build this please |
|
tested, green - in she goes. thanks for efforts and patience! |
This implements LTO for specific configuration types and enables ThinLTO in case that is found. It also removes old whole program optimization in favor of this target based mode.
See OPM/opm-grid#885 (comment).