Skip to content

Reimplement support for Link Time Optimization#4666

Merged
akva2 merged 1 commit into
OPM:masterfrom
SoilRos:reimplement-lto-support
Sep 5, 2025
Merged

Reimplement support for Link Time Optimization#4666
akva2 merged 1 commit into
OPM:masterfrom
SoilRos:reimplement-lto-support

Conversation

@SoilRos

@SoilRos SoilRos commented Jul 24, 2025

Copy link
Copy Markdown
Member

This implements LTO for specific configuration types and enables ThinLTO in case that is found. It also removes old whole program optimization in favor of this target based mode.

See OPM/opm-grid#885 (comment).

@SoilRos SoilRos added enhancement manual:irrelevant This PR is a minor fix and should not appear in the manual labels Jul 25, 2025
@SoilRos

SoilRos commented Jul 25, 2025

Copy link
Copy Markdown
Member Author

jenkins build this serial please

@SoilRos SoilRos force-pushed the reimplement-lto-support branch 2 times, most recently from b50f96c to 78931ff Compare July 25, 2025 00:12
@SoilRos

SoilRos commented Jul 25, 2025

Copy link
Copy Markdown
Member Author

I did not find how to install the new CMake file. Where do I have to add it?

@akva2

akva2 commented Aug 19, 2025

Copy link
Copy Markdown
Member

cmake files are globbed, or well, entire folder is installed https://github.com/OPM/opm-common/blob/master/CMakeLists.txt#L420

Comment thread cmake/Modules/OpmInterproceduralOptimization.cmake Outdated
Comment thread cmake/Modules/UseOptimization.cmake Outdated
@SoilRos SoilRos force-pushed the reimplement-lto-support branch from 78931ff to 5df4bc4 Compare August 21, 2025 10:13
@SoilRos

SoilRos commented Aug 21, 2025

Copy link
Copy Markdown
Member Author

@akva2 I just applied the requested changes plus an improved version of the documentation.

@SoilRos SoilRos requested a review from akva2 August 21, 2025 10:15
@SoilRos SoilRos force-pushed the reimplement-lto-support branch from 5df4bc4 to f86cdb6 Compare August 21, 2025 10:22

@akva2 akva2 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does the advertised job and looks good as such.

That being said, it's a bit unfortunate that the jobs option do not apply to the standard cmake support. Modifying CMAKE_${lang}_COMPILE_OPTIONS_IPO could work (ie I hacked it in and it did work), ie, replace -flto=auto with -flto=jobs for GNU and add -flto-jobs=jobs for clang so if you feel up to doing that cleanly... But I can merge as-is, it is still an improvement.

@SoilRos

SoilRos commented Aug 21, 2025

Copy link
Copy Markdown
Member Author

Where did you find CMAKE_${lang}_COMPILE_OPTIONS_IPO? I cannot see this in the documentation. I did not find a way to configure CMake's IPO, and that's why I ended up implementing it "manually" for clang.

As for the linker jobs, what I understand is that the linker needs to use something called a "server protocol" (no idea what exactly is that) so that it can coordinate with the build system and not oversubscribe the system. Otherwise, every linker job from the build system may try to use the whole system at the same time. As far as I know, make implements it, whereas ninja only started to implement it not very long ago. I have no idea if they are finished and if other build systems can do it, I guess most don't. So, to enable the number of jobs option properly for GNU and Clang one needs to carefully craft the cases where is possible. Meanwhile, is just better to fallback to 1 process for all the cases.

As for the GNU incremental LTO, I just learned about it. I will have a look. If you could tell me how you used CMAKE_${lang}_COMPILE_OPTIONS_IPO, maybe I can get a more homogeneous interface...

@akva2

akva2 commented Aug 21, 2025

Copy link
Copy Markdown
Member

Admittedly by digging into internal details. You find the compiler support in /usr/share/cmake-xxx/Modules/Compiler/(GNU|Clang).cmake, where you can see the variables it sets up.

I agree about the default to 1. The core of the problem is in fact that cmake defaults to auto (or 0 for clang/lld) which means use all you can find. Remedying that is really what I'm after, because as-is enabling LTO is kinda "dangerous" on, say, a laptop..

@SoilRos

SoilRos commented Aug 21, 2025

Copy link
Copy Markdown
Member Author

Ah ok, that's good to know. But we definitely cannot use this for setting up the LTO. I also didn't know that the default for GNU is auto. Thanks for the hint. I will try to fix that...

@SoilRos

SoilRos commented Aug 21, 2025

Copy link
Copy Markdown
Member Author

Alright, just by reading the GNU documentation, one has to set up the archiver and the randomizer tool to be the gnu ones. Otherwise, static libraries will not contain the LTO data. Even on debian, they seem to be different from the one provided by the system. This is precisely what the "old" code implements. The problem is that these tools need to be changed for the whole cmake configuration, not target based. This could override some user or toolchain settings too, so is definitely not ideal. Even more, the old code says that those options did trigger some bugs in the linker plugins. I will try to get some sensible defaults here too, but I would just suggest to keep it disabled for GCC and enable it by default for Clang.

@akva2

akva2 commented Aug 21, 2025

Copy link
Copy Markdown
Member

yeah i know, i wrote it after all. as far as I see cmake handles that mess in its GNU-FindBinUtils.cmake

@SoilRos

SoilRos commented Aug 21, 2025

Copy link
Copy Markdown
Member Author

Alright, thanks for looking. The GCC documentation also says that "Note that modern binutils provide plugin auto-load mechanism." That means that we could replicate the options set up by CMake, like ThinLTO but for GCC. My only question would be: do you know what this part of CMake with LTO does? Do we need to also include that too? I do not know that syntax nor for what it would be used for. Also, I just realized that ThinLTO is already enabled by default with cmake LTO. So what is missing in both cases is the parallelism and cache.

@SoilRos SoilRos force-pushed the reimplement-lto-support branch 7 times, most recently from 9d2f03e to 019a031 Compare August 22, 2025 14:05
@SoilRos

SoilRos commented Aug 22, 2025

Copy link
Copy Markdown
Member Author

@akva2 I just pushed the implementation for GCC LTO, including its incremental support which was added in GCC 15. I managed to test this in macOS with GCC 15 and Clang 17, which works flawlessly. However, I also tried it in a Debian machine with GCC 12, and the LTO insisted on discarding an "unused" function when archiving the library. Of course, executables or libraries that needed that function later on did not find anything.

I am unsure how to proceed. It seems to be a linker issue, so I left AppleClang enabled by default (there is only one linker in macOS AFIK). But it would also be nice to know which versions of LD and GCC also work well together so that this can be enabled there too.

@SoilRos

SoilRos commented Aug 22, 2025

Copy link
Copy Markdown
Member Author

jenkins build this serial please

@akva2 akva2 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but this looks really good now

Comment thread cmake/Modules/OpmInterproceduralOptimization.cmake Outdated
Comment thread cmake/Modules/OpmInterproceduralOptimization.cmake
@SoilRos SoilRos force-pushed the reimplement-lto-support branch from 019a031 to 5d29371 Compare August 28, 2025 08:50
@akva2

akva2 commented Aug 28, 2025

Copy link
Copy Markdown
Member

jenkins build this please

@SoilRos

SoilRos commented Aug 28, 2025

Copy link
Copy Markdown
Member Author

This is ready from my side. I could not test the combinations that work well for Linux, so the default behavior is to disable LTO for cases other than AppleClang. As soon as I find the conditions where we can enable more cases by default I will try to submit that in another PR.

@blattms

blattms commented Aug 28, 2025

Copy link
Copy Markdown
Member

@atgeirr @daavid00 We are thinking about activating limk time optimization with this PR for Mac. Is that Ok for you?

@daavid00

Copy link
Copy Markdown
Member

@atgeirr @daavid00 We are thinking about activating limk time optimization with this PR for Mac. Is that Ok for you?

I just built flow in my mac using clang with this PR and it seems all is good. However, this PR quite increases the compilation time (no free lunch as usual). Is the plan to add a cmake flag to turn this off/on? e.g., -DWHOLE_PROG_OPTIM?

@akva2

akva2 commented Aug 28, 2025

Copy link
Copy Markdown
Member

There already is an option -DOPM_INTERPROCEDURAL_OPTIMIZATION_TYPE=NONE

@SoilRos

SoilRos commented Aug 28, 2025

Copy link
Copy Markdown
Member Author

@daavid00 is the increase in compilation time very substantial? If everything works correctly, there must be a cache that should reduce compilation times in subsequent modifications of the code on the same build.

@daavid00

daavid00 commented Aug 28, 2025

Copy link
Copy Markdown
Member

@daavid00 is the increase in compilation time very substantial? If everything works correctly, there must be a cache that should reduce compilation times in subsequent modifications of the code on the same build.

For building all opm-common with master takes ca 2 min 30 s, while with this PR takes 5 minutes. It is always good to have the option to turn features on/off which this PR already has, so looking forward to this being merged.

@blattms

blattms commented Sep 1, 2025

Copy link
Copy Markdown
Member

Now I'm curious about compile time on Linux. I might check that.

Please indicate in the decription that the default has changed and how to go back to it for user and manual writer convenience.

@SoilRos

SoilRos commented Sep 2, 2025

Copy link
Copy Markdown
Member Author

@akva2 could you please push your changes to let this work on Linux?

@akva2

akva2 commented Sep 2, 2025

Copy link
Copy Markdown
Member

We are in the process but moving step by step as it needs some cleanups, you can use #4706

@akva2

akva2 commented Sep 3, 2025

Copy link
Copy Markdown
Member

everything is now in master. we need to decide on the default. i think the easiest way to get this in is to make it off by default everywhere, and then change the default separately.

@SoilRos

SoilRos commented Sep 3, 2025

Copy link
Copy Markdown
Member Author

Great, thank you! I agree, let's merge with default off and I later make some table with measurements with different linkers and compilers... then we decide if it's worth it for everyone.

@akva2

akva2 commented Sep 4, 2025

Copy link
Copy Markdown
Member

So, will you disable it for (apple)-clang as well, then I will hit the shiny green. Also I think the default mode output is just confusing, so maybe remove that as well, and replace it with the actual used mode?

@SoilRos SoilRos force-pushed the reimplement-lto-support branch 2 times, most recently from 5439d24 to a6588fe Compare September 4, 2025 14:14
@SoilRos

SoilRos commented Sep 4, 2025

Copy link
Copy Markdown
Member Author

I just pushed a new version that:

  • Sets default to NONE in all cases (and updates the documentation for that).
  • Replaces the output to print the used IPO type.
  • Rebases to current master and squashed commits.

@akva2

akva2 commented Sep 5, 2025

Copy link
Copy Markdown
Member

One small final request: opm-upscaling is a bit messy and enables the Fortran language. For reasons I don't care to investigate, the check_ipo_support fails in that case. This is a bit annoying if you use a 'super'-build, ie, https://github.com/OPM/opm-utilities/tree/master/opm-super since opm-upscaling is the last module built, resulting in the drop-down list only showing the 'NONE' entry for optimization type (and making LTO unavailable in opm-upscaling in general). The easy fix is to add LANGUAGES C CXX to the check_ipo_support call.

@akva2

akva2 commented Sep 5, 2025

Copy link
Copy Markdown
Member

Other than that this has full sign-off now, tested

  • clang+lld
  • clang+bfd (fails for lens_immiscible_ecfv_ad_mcu)
  • clang+mold
  • clang+gold
  • gcc+bfd
  • gcc+gold
  • gcc+mold (fails for a lot of targets)

The two failures are not really important, clang+bfd isn't really supported, gcc+mold fails to a known bug in my mold version.

This implements LTO for specific configuration types like ThinLTO and incremental LTO from clang and gcc. It also removes old whole program optimization in favor of this target based mode.
@SoilRos SoilRos force-pushed the reimplement-lto-support branch from a6588fe to 60b6e6d Compare September 5, 2025 08:44
@SoilRos

SoilRos commented Sep 5, 2025

Copy link
Copy Markdown
Member Author

The easy fix is to add LANGUAGES C CXX to the check_ipo_support call.

Fair, I only tested this with C++ anyways, which I guess also extends to C. In any case, that is done.

Thanks for the testing!

@akva2

akva2 commented Sep 5, 2025

Copy link
Copy Markdown
Member

jenkins build this please

@akva2

akva2 commented Sep 5, 2025

Copy link
Copy Markdown
Member

tested, green - in she goes.

thanks for efforts and patience!

@akva2 akva2 merged commit e24b3fd into OPM:master Sep 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

manual:irrelevant This PR is a minor fix and should not appear in the manual

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants