Gbuild feature: a methd to automatically generate binary package of external libraries and re-use these when-ever appropriate

Norbert Thiebaud nthiebaud at gmail.com
Sun Jul 7 15:46:53 PDT 2013


Issue: gerrit buildbot need to do full build as a consequence they
typically rebuild a bunch of 'external' libraries, most of which
rarely change. Windows, the slowest of all, is the most impacted
because it is also the one with the least possibility of using
'system' libraries, and the one for which ccache is of negligible
value, when it even works at all.

Feature:
+ add --enable-library-bin-tar support in configure to allow the use
of that feature.
+ candidate library needs to follow the UnpackedTarball/ExternalProject pattern
+ for these eligible library, all is needed is to add as a 4th
argument of the Unpacked_tarball_set_tarball, that 4th argument should
be the name of the top level directory of the module that build that
library. for example for redland,rascal,raptor that 4th argument is
redland.  Bear in mind that the function takes a 3rd argument that is
very often omitted, so you would need to add ,,<module>) (2 comma).
Important: the top-level directory is assumed to be also the Module
name. so the 4th argument is used for both purpose.

If the config option is set and for these eligible libraries, gbuild
will detect if a proper tarball exist in TARFILES_LOCATION. if it does
then it untar that and bypass the untaring of the source + patching +
build of the ExternalProject targets.  If no proper tarball exist then
the normal operation is done based on the source tarball and after the
ExternalProject targets are built, a tarball containing the built
libraries is created.

Note: a git repo as a SRCDIR is required.
Important: this feature rely on the postulate that anything that will
modify the result of an external library build is 1/ the source
tarball 2/ a variable in config_host.mk 3/ a file in core in the
top-level directory for that library.
IOW: using environment variables to change the behavior of the build
is not detected unless that environment variable is used to drive the
content of config_host.mk

Known Issue:
if an external library depend on another external library statically
linking. or if the dependencies is upgraded in a non abi-compatible
way, the system does not detect the dependency and the need to
rebuild.
a possible work around would be to maintain in config_host.mk a field
that represent the ABI of such libraries.. doing so would trigger a
rebuild due to a change in config_host.mk.. the drawback is that that
would trigger a rebuild of _all_ libraries that use the binary package
feature.
icu, the biggest 'offender' already has such ABI info in config_host.mk.

When in use, this feature will accumulate binaries packages in your
TARFILES_LOCATION. there is no mechanism to clean-up 'old'
packages.... just like there is no such mechanism to clean up old
source packages..
but since every change in config_host.mk is susceptible to trigger a
new build, these binary package can end-up representing a substantial
size.
The feature was intended primarily to be used by gerrit buildbot,
mostly dedicated machines who have decent disk size and no other
purpose than to do lo build... In this context, it is quite possible
to log in every month or two to do some clean-up...
But individual users seeking to take advantage of that feature should
monitor the size of their TARFILES_LOCATION.


Results:
On a mac laptop, doing a ccache-disable build (which is what Windows
does)... when re-suing the binary the total elapsed build time went
from 71 minutes to 60 minutes
, and the total cpu time from 420 minutes to 378 minutes, while
bypassing the build of
tomcat, liblangtag, hsqldb, cppunit, beanshell, xpdf, python3,
redland, openldap and postgresql

Note: this set of modules was subject to only 5 patches in the last month.
git log  --since="1 months ago" --format=oneline -- tomcat liblangtag
cppunit beanshell hsqldb xpdf python3 redland openldap postgresql | wc
-l
5

or 0.03% of the commits in the last month impacted any one of these modules...


Internal/Implementation:

see
https://gerrit.libreoffice.org/4763
https://gerrit.libreoffice.org/4764
https://gerrit.libreoffice.org/4765

a binary package is the tar of the $WORKDIR)/UnpackedTarball/<item>/
after patching and build.
It is identified by 4 things
1/ the source tarball filename (which already contain a md5
identifying uniquelly a source tarball)
2/ the sha1 of the tree object in git of the top-level directory
passed as 4th argument to set_tarball(). this detect any change in the
library's gbuild operation... like adding a new patches etc...
3/ the sha1 of config_host.mk
4/ INPATH. this should allow the support for cross-compile (not
tested, but in theory should work out-of-the-box)

the binary package filename contain all these elements. in the form:
(3)_(2)_(1).(4).tar.gz
that filename is created by a helper script : solenv/bin/bin_library_info.sh

upon execution of UnpackTarball_set_tarbal() we detect if we are
allowed to use the feature (USE_LIBRARY_BIN_TAR == YES), if so we
check if the 4th argument is not empty... if either of these
conditions in fase, the code does as it used to .. no changes
If the condition are true, then we check if a proper binary exist.
If so, then we use that tarball instead of the source tarball, then we
set a target variable for UnpackedTarball_get_target and all the
state_target of the associated External_Project to indicate that we
are using a binary tarball.
Else we add a rule to build the binary tarball. Because some target
below the ExternalPackage do occasionally mess with the workdir
resulting of the ExternalProject, we need to wait for all the target
of a given Module to finish before we can tar the result... to taht
end a new Module target 'almost' is introduced. the purpose is that
all regularly registered target of the module are pre-req of 'almost'
and the Module_get_target() has almost as pre-req. this allow us to
add a rule that has 'almost' has pre-req and is a pre-req for
Module_get_target to insure that the tarring will occur after all
registered targets of the module are done.

both UnpackedTarball__command and ExtenalProject_run detect if a
binary tarball was used to essentially turn them into NOP in that case


Norbert


More information about the LibreOffice mailing list