LTO Link Time Optimization ? #52

Closed
opened 2021-11-04 11:33:23 +01:00 by Lehner82 · 23 comments

This should probalby add a similar performace boost as v3. Any plans to add this ?

This should probalby add a similar performace boost as v3. Any plans to add this ?
Owner

Hi @Lehner82. That should only be a minor change, so yes that can be enabled fairly quickly.

I wouldn't rebuild all packages because of this, but newly build packages would have it enabled.

Hi @Lehner82. That should only be a minor change, so yes that can be enabled fairly quickly. I wouldn't rebuild all packages because of this, but newly build packages would have it enabled.
anonfunc added the
enhancement
label 2021-11-04 11:37:51 +01:00
Author

Ok, thats great to hear

Ok, thats great to hear
Lehner82 reopened this issue 2021-11-04 11:45:35 +01:00
Author

-falign-functions=32 is also used in the offical Intle ClearLinux repos and is recommended on Gentoo to be used alongside LTO. Should boost performance mainly Intel with no regressions on AMD.

-falign-functions=32 is also used in the offical Intle ClearLinux repos and is recommended on Gentoo to be used alongside LTO. Should boost performance mainly Intel with no regressions on AMD.
Owner

Implemented in 7eb1be8371 and 715063281a.

Implemented in https://git.harting.dev/anonfunc/ALHP.GO/commit/7eb1be8371ec3f15e678c1c23b98aa8e2a0168c5 and https://git.harting.dev/anonfunc/ALHP.GO/commit/715063281ab45aeb986364ee1094b4ccb88a4d69.
Owner

knot is the first package build with LTO.

[knot](https://alhp.harting.dev/packages.html#community-x86-64-v3-knot) is the first package build with *LTO*.

Should we follow GentooLTO on this?

Should we follow [GentooLTO](https://github.com/InBetweenNames/gentooLTO) on this?
Author

Its probably the best recource for performace focused compiling without placebo out there. GentooLTO has lists for a lot of workarounds too when it comes to LTO, O3, Graphite, and even Ofast. It recommends Graphite to be used alongside LTO so thats probably the next thing to look into.

Its probably the best recource for performace focused compiling without placebo out there. GentooLTO has lists for a lot of workarounds too when it comes to LTO, O3, Graphite, and even Ofast. It recommends Graphite to be used alongside LTO so thats probably the next thing to look into.
Owner

@Gontier-Julien follow on what exactly? Regarding patches?

@Gontier-Julien follow on what exactly? Regarding patches?

Regarding this part:

The default configuration of GentooLTO enables the following:
O3
Graphite ( requires gcc to be built with the graphite use flag )
-fno-semantic-interposition
-fipa-pta
-fdevirtualize-at-ltrans
LTO

and i'm agreeing with @Lehner82 on this, they already did a lots of work so that problably going to help with bugs etc

Its probably the best recource for performace focused compiling without placebo out there. GentooLTO has lists for a lot of workarounds too when it comes to LTO, O3, Graphite, and even Ofast. It recommends Graphite to be used alongside LTO so thats probably the next thing to look into.

Regarding this part: > The default configuration of GentooLTO enables the following: O3 Graphite ( requires gcc to be built with the graphite use flag ) -fno-semantic-interposition -fipa-pta -fdevirtualize-at-ltrans LTO and i'm agreeing with @Lehner82 on this, they already did a lots of work so that problably going to help with bugs etc > Its probably the best recource for performace focused compiling without placebo out there. GentooLTO has lists for a lot of workarounds too when it comes to LTO, O3, Graphite, and even Ofast. It recommends Graphite to be used alongside LTO so thats probably the next thing to look into.
Owner

That could be a great source indeed, but if I understood correctly using GentooLTO would require the following:

  • some sort of automatic syncing of said patches or build-/linkerflags
  • match gentoo packages to arch packages
  • using graphite?

Alternative to syncing the patches would be to port them to our own format (in this case a structure for ALHP to match patch->pkgbase and the needed patch or modification).

That could be a great source indeed, but if I understood correctly using GentooLTO would require the following: * some sort of automatic syncing of said patches or build-/linkerflags * match gentoo packages to arch packages * using graphite? Alternative to syncing the patches would be to port them to our own format (in this case a structure for ALHP to match patch->pkgbase and the needed patch or modification).

I don't think we need to match gentoo packages to arch packages, since if a packages fail to build, tell for example the structure of ALPH (as you mentioned) to find a pottential patch and apply it.

And even if it where the case, i don't think most of the packages should fail to build, and if it fail to build if there no patch available, try to build it with simple lto.

I don't think we need to match gentoo packages to arch packages, since if a packages fail to build, tell for example the structure of ALPH (as you mentioned) to find a pottential patch and apply it. And even if it where the case, i don't think most of the packages should fail to build, and if it fail to build if there no patch available, try to build it with simple lto.
Author

And even if it where the case, i don't think most of the packages should fail to build, and if it fail to build if there no patch available, try to build it with simple lto.

That has also been my experience with GentooLTO. There were only 2 or 3 packages that failed to build with LTO. I never had any trouble with O3 or graphite, only with Ofast.

> And even if it where the case, i don't think most of the packages should fail to build, and if it fail to build if there no patch available, try to build it with simple lto. That has also been my experience with GentooLTO. There were only 2 or 3 packages that failed to build with LTO. I never had any trouble with O3 or graphite, only with Ofast.
Owner

Well, LTO is enabled now, so we'll see what fails. If there is something that fails with -O3/LTO we can discuss what needs to be done in that specific case.

Is there something else we can do here? Otherwise we can close this for now.

Well, LTO is enabled now, so we'll see what fails. If there is something that fails with -O3/LTO we can discuss what needs to be done in that specific case. Is there something else we can do here? Otherwise we can close this for now.

Well, LTO is enabled now, so we'll see what fails. If there is something that fails with -O3/LTO we can discuss what needs to be done in that specific case.

Is there something else we can do here? Otherwise we can close this for now.

I think we can close this now

> Well, LTO is enabled now, so we'll see what fails. If there is something that fails with -O3/LTO we can discuss what needs to be done in that specific case. > > Is there something else we can do here? Otherwise we can close this for now. I think we can close this now
Owner

I noticed significant more memory usage building with LTO enabled. For example xf86-video-intel is currently building, using a total of 43.6GiB RES.

That could overwhelm the buildserver, which only has ~64G available.

I'll continue monitoring. It could result in some packages not being build because the (cgroup) OOM-killer acts on the build process.

Could also be a bug or weird behavior with some packages.

EDIT: Checked the build history, and it seems only some packages have that huge memory usage. Potential blacklist candidates.

I noticed significant more memory usage building with LTO enabled. For example `xf86-video-intel` is currently building, using a total of 43.6GiB RES. That could overwhelm the buildserver, which only has ~64G available. I'll continue monitoring. It could result in some packages not being build because the (cgroup) OOM-killer acts on the build process. Could also be a bug or weird behavior with some packages. *EDIT*: Checked the build history, and it seems only some packages have that **huge** memory usage. Potential blacklist candidates.

I noticed significant more memory usage building with LTO enabled. For example xf86-video-intel is currently building, using a total of 43.6GiB RES.

That could overwhelm the buildserver, which only has ~64G available.

I'll continue monitoring. It could result in some packages not being build because the (cgroup) OOM-killer acts on the build process.

Could also be a bug or weird behavior with some packages.

EDIT: Checked the build history, and it seems only some packages have that huge memory usage. Potential blacklist candidates.

I think it a bug, just had some package to build and one had 3x the memory usage

It seem fixed with the next gcc version (11.2):

Link-time optimization improvements:

    The LTO bytecode format was optimized for smaller object files and faster streaming.
    Memory allocation of the linking stage was improved to reduce peak memory use.
> I noticed significant more memory usage building with LTO enabled. For example `xf86-video-intel` is currently building, using a total of 43.6GiB RES. > > That could overwhelm the buildserver, which only has ~64G available. > > I'll continue monitoring. It could result in some packages not being build because the (cgroup) OOM-killer acts on the build process. > > Could also be a bug or weird behavior with some packages. > > *EDIT*: Checked the build history, and it seems only some packages have that **huge** memory usage. Potential blacklist candidates. I think it a bug, just had some package to build and one had 3x the memory usage It seem fixed with the next gcc version (11.2): ``` Link-time optimization improvements: The LTO bytecode format was optimized for smaller object files and faster streaming. Memory allocation of the linking stage was improved to reduce peak memory use. ```
Owner

I think it a bug, just had some package to build and one had 3x the memory usage

It seem fixed with the next gcc version (11.2):

Link-time optimization improvements:

    The LTO bytecode format was optimized for smaller object files and faster streaming.
    Memory allocation of the linking stage was improved to reduce peak memory use.

Thanks for doing the research.

> I think it a bug, just had some package to build and one had 3x the memory usage > > It seem fixed with the next gcc version (11.2): > > ``` > Link-time optimization improvements: > > The LTO bytecode format was optimized for smaller object files and faster streaming. > Memory allocation of the linking stage was improved to reduce peak memory use. > ``` Thanks for doing the research.
Owner

I applied most of gentooLTO's package exemptions. The package status page also shows LTO status for packages build from now on (and if they are excluded from building with LTO).

I applied most of [gentooLTO's package exemptions](https://github.com/InBetweenNames/gentooLTO/blob/master/sys-config/ltoize/files/package.cflags/lto.conf). The package status page also shows LTO status for packages build from now on (and if they are excluded from building with LTO).

I'm reopening this issue for now since, they're where some news about LTO on the monthly report fo December.

Devtools

A new devtools version 20211129 has been released [7]. The new devtools
package is waiting in [staging] for the python 3.10 rebuild to finish in
order to avoid additional hassle with an already cumbersome rebuild.

LTO has additionally been enabled is this release as the issues should
be easy to spot and extremely simple to work around by just setting an
options knob inside the PKGBUILD.

Once devtools has been moved out of [testing], a wide ranged rebuild is
planned to make use of the new flags on a broad scale.

I'm reopening this issue for now since, they're where some news about LTO on the monthly report fo December. > ## Devtools > A new devtools version 20211129 has been released [7]. The new devtools package is waiting in [staging] for the python 3.10 rebuild to finish in order to avoid additional hassle with an already cumbersome rebuild. > LTO has additionally been enabled is this release as the issues should be easy to spot and extremely simple to work around by just setting an `options` knob inside the PKGBUILD. >Once devtools has been moved out of [testing], a wide ranged rebuild is planned to make use of the new flags on a broad scale.
Owner

I have seen this mail. Only thing that should change on our side is how LTO is enabled. If upstream now checks if LTO works or needs workarounds/disabling, we can drop some logic handling these cases.

I have seen this mail. Only thing that should change on our side is how LTO is enabled. If upstream now checks if LTO works or needs workarounds/disabling, we can drop some logic handling these cases.

Just as in info, but i think you've might seen it, the new devtools is now in stable

Just as in info, but i think you've might seen it, the new devtools [is now in stable](https://archlinux.org/packages/extra/any/devtools/)
Owner

Just as in info, but i think you've might seen it, the new devtools is now in stable

I adjusted ALHP's logic to consider per-default enabled LTO in 5432ea326d

> Just as in info, but i think you've might seen it, the new devtools [is now in stable](https://archlinux.org/packages/extra/any/devtools/) I adjusted ALHP's logic to consider per-default enabled LTO in https://somegit.dev/ALHP/ALHP.GO/commit/5432ea326d4f8e0c107d08480378d763057c50c0

Just as in info, but i think you've might seen it, the new devtools is now in stable

I adjusted ALHP's logic to consider per-default enabled LTO in 5432ea326d

Awesome!
Also thank you for you great work! ^^

> > Just as in info, but i think you've might seen it, the new devtools [is now in stable](https://archlinux.org/packages/extra/any/devtools/) > > I adjusted ALHP's logic to consider per-default enabled LTO in https://git.harting.dev/ALHP/ALHP.GO/commit/5432ea326d4f8e0c107d08480378d763057c50c0 Awesome! Also thank you for you great work! ^^
Sign in to join this conversation.
No description provided.