compiler plugin

Wed Jul 13 12:00:18 UTC 2016

On 07/13/2016 11:53 AM, Norbert Thiebaud wrote:
> On Wed, Jul 13, 2016 at 4:18 AM, Stephan Bergmann <sbergman at redhat.com> wrote:
>> The idea behind the second choice is to periodically update and rebuild the
>> bot's compilerplugins repo, so that sequences of LO builds on the bot are
>> guaranteed to all use the same plugin.so and be able to reuse cached objects
>> across those builds.  However, that would mean that Gerrit changes based on
>> a relatively old code base could be built against a newer version of the
>> compilerpluigns repo, running into warnings/errors from new plugins, the
>> fixes for which across the LO code base not yet being included in the code
>> base revision the change is based on.  So I think this approach isn't
>> feasible.
>
>
> Actually that is very feasible. since
> 1/ the solution for these old-based change is 'rebase'

Not necessarily.  Consider a change being discussed between an author 
and a reviewer, leading to the author generating multiple revisions of 
the change.  To track the differences between the revisions, it is best 
if there's no rebases in between.

>> (Another problem would be that e.g. the name of a class from the
>> LO code base can be whitelisted in one of the plugins, to suppress
>> warnings/errors from that class.  If the name of the class changes in the LO
>> code base, the compilerplugins repo must be changed in sync.)
>
> That sound like the whole contraption is quite fragile and random.. if
> you have to 'whitelist' random 'class'
> directly in the pluging... I'd suggest there is something
> fondamentally wrong in the process.
> whitelisting, or more eactly a way to say to the plugin: 'shut-up, I know'
> should be done by annotating the source code not by patching the plugin.

Even ignoring the negative connotation of "contraption", that's how the 
existing plugin code works, and apparently works reasonably well and 
successfully.

>> So my proposal would be as follows:  First, check whether enabling
>> compiler_check=content or the compiler_check=string:X setup (and increasing
>> the ccache size if necessary and possible)
>
> the cache size is 100GB dedicated to clang... that is quite a chunk of
> disk already.

The question would be whether it turns out to be enough in practice, 
and---if it is not---whether TDF could do something about it.

>> gives good-enough performance.
>> If not, restrict commits to compilerplugins/ to be less frequent, and see
>> whether that increases the ccache hit rate and results in good-enough
>> performance.
>
> I do not favor compiler_check=content... as this means calculating a
> hash of the compiler _and the plugin every time.
> the plugin.so alone is 150MB (which is quite insane btw considering
> that clang itself is ~50MB)

(Not sure where you get those ~50MB for clang.  For example, 
/usr/bin/clang-3.8 on my F24 box is merely 92K, but isn't linked 
statically, which is atypical for Clang builds; on the other hand, my 
local CMAKE_BUILD_TYPE=RelWithDebInfo trunk build, statically linked, is 
even 1.5G.)

> I really do not need to waste too much time experimenting to know that
> hashing 15,000 X 200MB = 3TB per build is going to be a waste.

It should merely be a matter of setting one environment variable and 
watching the effect on 
<http://ci.libreoffice.org/job/lo_gerrit/Config=linux_clang_dbgutil_64> 
throughput.

> compiler_check=string  is no better since that would actually run
> these as external process for each ccache invocation
> bearing in mind that a build is north of 15K of theses...

Would run what as external process for each ccache invocation?  Do you 
confuse "compiler_check=string:value" with "compiler_check=a command 
string"?

> What I suggest is:
>
> 1/ allow the plugins to be built standalone and delivered (ideally,
> spin it in a sub-module). Allow configure.ac to use a 'external
> version of the compiler plugin.
> 2/ work on the plugin in a branch.. (or ideally in a sub-module), push
> core.git needed fixes in advance of merging the pluging changes
> 3/ every so often merge the plugin changes... (if it is a submodule it
> is just a matter of moving the submodule ref in core.git) (if 2/ was
> followed that does not break master and master had been compatible
> with if for some time
> 4/ at that point a jenkins job will use 1/ to deploy a new version of
> the plugin... everything will be build with that from that point on
> (again if it is a submodule.. that can be cloned and built
> stand-alone.. which would be less wastefull thatn maintaining a full
> core.git workspace for that purpose on each concerned slaves)
> 5/ too old gerrit patch could fails.. but that is fine, they need to
> be rebased anyway.. and hopefully that will give an incentive to
> people not to keep patches on old base...

See above for why "aggressive rebasing" might not always be desirable. 
(Plus, it shifts one more burden to---newbie---contributors how to react 
to "random" breakage of the changes the submit to Gerrit.)

I'm of course open to a more restricted approach to plugin changes, if 
that helps.  However, for one I would really love to also learn what 
impact "check whether enabling compiler_check=content or the 
compiler_check=string:X setup (and increasing the ccache size if 
necessary and possible)" will have on the problem.  And for another, my 
gut feeling is that we'll fare better if we have for each revision R of 
the LO code base one corresponding revision R' of the plugin code, and 
make sure that everybody (including the bot and people locally building 
with Clang, potentially excluding plugin developers using a "trunk" 
version of the plugin code) uses plugin R' when building code revision R.