compiler plugin

Stephan Bergmann sbergman at redhat.com
Wed Jul 13 09:18:40 UTC 2016


On 07/12/2016 09:32 AM, Norbert Thiebaud wrote:
> so I ran some number... after upgrading ccache
>
> clang+pluging+dbgutil  (time result in minutes.. elapsed/user/system)
>
> cold: 33/840/50
> hot: 9/79/14
> no-op: 4/46/1
>
> clang+dbgutil (no plugins)
>
> cold: 26/605/46
> hot: 9/79/14
> no-op: 4/46/1
>
> gcc-dbgutil
>
> cold: 28/621/97
> hot: 9/79/14
> no-op: 4/45/1
>
> note: none of these comprise make check
>
> so the cost of the plugins on a full build is 7 minutes elapsed, ~240
> minutes cpu.
>
> ccache works fine... on the other hand any change in any of the
> plugins invalidate the cache...

Looking at the ccache documentation at 
<https://ccache.samba.org/manual.html#_configuration_settings> and the 
source code at <https://github.com/ccache/ccache/blob/master/ccache.c>:

ccache detects that clang is called in a way so that it uses our 
compilerplugins/obj/plugin.so, and then includes information about both 
the clang executable itself and the plugin.so in its hash (which 
determines whether a cached object has been built with the same 
toolchain as a newly requested object).

ccache knows different ways of what kind of information about a 
toolchain entity (the clang executable itself and the plugin.so) to 
include in the hash:  The default is compiler_check=mtime, which 
includes the entity's mtime and size.

Now imagine the Gerrit/Jenkins bot does three builds A, B, C in 
sequence, where A and C are based on the same revision of 
compilerplugins/, while B is based on a different revision.  That means 
that compilerplugins/obj/plugin.so will be built anew for each of A, B, 
C, and will have different mtime for A and C.  That in turn means that 
/any/ objects cached during build A will not be taken into account 
during build C, even if they would still be in the cache after build B.

Another ccache configuration option is compiler_check=content, which 
uses a hash of the entity's content instead of its mtime/size.  If the 
bot's underlying toolchain produces sufficiently reproducible builds, so 
that compilerplugins/obj/plugin.so from builds A and C have identical 
content, then build C should be able to reuse the objects from build A 
that are still in the cache (given a large enough cache).

compiler_check=content computes the hash of both the clang executable 
itself and the plugin.so for each ccache request.  Should that turn out 
to slow things down too much compared to compiler_check=mtime, a third 
option would be comiler_check=string:X, which simply includes the 
information "X" for each entity (without looking at the entity's real 
characteristics at all).  For each build done by the bot, that X could 
e.g. be determined as the SHA1 of the latest git commit that modified 
compilerplugins/ (and passed from the build to ccache via the 
CCACHE_COMPILERCHECK environment variable corresponding to the 
compiler_check configuration setting).  (That would use the same "X" 
when determining the characteristics of both the clang executable itself 
and the plugin.so, which would be fine assuming that the clang 
executable itself never changes anyway, at least not in a way that 
necessarily requires rebuilds.  Worst, the ccache would need to be 
cleaned by the bot's admin when installing a new version of Clang.) 
This option would also work if the compiler_check=content option should 
not work because the bot's builds of compilerplugins/obj/plugin.so turn 
out to not produce exactly the same content.

> I've enabled an additional build for gerrit doing clang + plugins on linux
> we will see how that perform in average.
> preliminary observation is that there is way to much churn in the
> plugins for this to be viable at this time....

There are generally two ways to address bot performance issues caused by 
changes to comilerplugins/:  One, make the builds faster.  Two, make 
changes to compilerplugins/ less frequent.

For one, one option should be to make ccache more effective by using 
compiler_check=content or compiler_check=string:X as described above 
(and potentially also increasing the ccache size if necessary and 
possible).  Another option might be to just throw more computing power 
at the problem.

For two, two options have been discussed so far:  Either restrict 
commits to compilerplugins/ to certain points in time (when they come in 
in batches).  Or break compilerplugins/ out into its own git repo.

The main problem I see with the first choice is that new plugins, or 
substantial changes to existing ones, typically also require changes to 
very many files across the "real" LO code base.  Doing such changes on a 
branch (so that it can be merged later, at the next point when commits 
to compilerplugins/ are allowed), would thus likely result in 
large-scale merge conflicts.  The solution might be to commit the 
resulting changes to "real" LO code base directly, and only hold off the 
compilerplugins/ changes themselves on a branch.

The idea behind the second choice is to periodically update and rebuild 
the bot's compilerplugins repo, so that sequences of LO builds on the 
bot are guaranteed to all use the same plugin.so and be able to reuse 
cached objects across those builds.  However, that would mean that 
Gerrit changes based on a relatively old code base could be built 
against a newer version of the compilerpluigns repo, running into 
warnings/errors from new plugins, the fixes for which across the LO code 
base not yet being included in the code base revision the change is 
based on.  So I think this approach isn't feasible.  (Another problem 
would be that e.g. the name of a class from the LO code base can be 
whitelisted in one of the plugins, to suppress warnings/errors from that 
class.  If the name of the class changes in the LO code base, the 
compilerplugins repo must be changed in sync.)


So my proposal would be as follows:  First, check whether enabling 
compiler_check=content or the compiler_check=string:X setup (and 
increasing the ccache size if necessary and possible) gives good-enough 
performance.  If not, restrict commits to compilerplugins/ to be less 
frequent, and see whether that increases the ccache hit rate and results 
in good-enough performance.


More information about the LibreOffice mailing list