test-infra proposal: master-tested branch (was: test infrastructure ideas appreciated ...)

Norbert Thiebaud nthiebaud at gmail.com
Wed Jun 10 12:22:53 PDT 2015


On Wed, Jun 10, 2015 at 8:04 AM, Bjoern Michaelsen
<bjoern.michaelsen at canonical.com> wrote:
> Hi,
> As such, here is one idea for infrastructure:
> - Create a branch master-tested

We can get all that merely with git-notes I think.
iow instead of a separate brnch, just annotate or even maiybe maintain
a tag on master
to indicate the last 'green master' in the sens you gave.

Today the jenkins tinderbox operate like their ancestor: they jump
around moving forward.. but not every commit get built.
and since they are not all in sync it is hard to garantee that you
will find a given commit that has been validated for all conf.
_but_ with more hardware comming online, I want to move to a more
'bibisect build model' where _every commit get built.
then I can have a matrix job so that we know the overall result of the
build of a given commit for all configs..
like we do for gerrit today.
and when a given commit will ge found all-green we annotate it as such
in git notes
(git notes is more flexible than a tag, because that allow us to do
the build somewhat out-of-order without pain,
for instance to allow 2 or more 'set' of builder to work side by
side.. each set move forward so can do incremental, but that means
that
they can _report_ out of order... so managing a tag would be extra pain.

regular fast turn around tinderbox would still be in the mix to have
quick alert of a breaker on a given platform

TDF is beefing up the infrastructure.. we have 2 nice and beefy 1U
that are on purchase order that will become windows builder.
We are consolidating owned and lent MAC resources to improve network
bandwidth and stability, but I intend to push for the purchase of
MacPro.
(I got one myself and it perform quite well.. to the point that it is
cost-effective compared to mac mini, especially the more recent
models)
Linux box will have to ramp up too.. but that is usually not that much
a problem.. cloud based stuff are fairly competitive for that need.
So we can be more reactive with resource capacity for linux.

The one thing that everybody can pitch in to help is this:

There are 3 kind of failure in ci:

- a user induced one (that are the one we are looking for): a change
that make something not build or fail test(s)
- a infra induced one: the slave bot misbehave for some reason, are
fails despite the fact there is nothing wrong really. For these I try
to have them repported as 'unstable' rather then 'Fails'
as much as possible...
- a test auto-induced one: when a test is unstable and produce random
failures based on circumstances... the infamous 'heisenbugs'
and heisenbug can be a systemic/design problem or can be a real bug
that is hard to trigger. either way these are not useful, and in fact
harmful in a ci context; because the human nature is 'If you can't
reproduce it is not a bug'
so the later category of real hard to trigger bug is always labeled
'systemic error' and ignored anyway... and it make people numb to
errors...
For automated testing, trust is paramount: heisenbug test failure are
the enemy, false non-failure is bad but actually less painful

Today we have different categories of tests but mostly based on
time-to-run versus 'stability'
what I would like to see is a 'ci' target in which we had all the
tests that _shall_ and _will_ pass unless there is a code bug, no
exceptions.
Of course time-to-run is important, but that is not the first
criteria. time-to-run can be mitigated relatively easily with 'money'
but stability and trust in CI cannot.

If and when we have the nice-to-have problem of having so many test
that it becomes impractical to run them all all-the-time, we'll
conceive a 2/3 staged approach
where we still get a fast turn around for run of the mills problems,
and then deeper testing at a lower frequency.


All that being said, none of that matter if the culture does not
follow. no amount of CI can make people care.. what set the tone is
the core developer group, the rest of us looks around how it is done
and emulate the behavior.
So we really need the core group of developer to lead by example wrt
to taking the state of master seriously... that include for instance
the fire and forget 'one-liner that can't possibly break anything' on
Friday 5pm..
That include pro-actively revert-fix-resubmit stuff, when a breakage
is not obvious... that include use gerrit more, especially for stuff
that are not super time sensitive... iow does it really matter if this
patch land in 5 hours or tomorrow rather than right now ? There is no
hard and fast rule for that that would be flexible enough to
accommodate real-world situations... but the only alternative to
self-best-effort are pretty much black-and-white all-or-nothing
machine enforced rules.. which is really not desirable.


PS: just to give an idea about the state of master. I built recently
bibisect for windows covering the 5.0 dev period.. iow from the
libreoffice-4-4-branch-point to the current head of the
libreoffice-5-0 branch
that covered 10820 commits of which 2168 where not _buildable_ that is
they failed a make build-only with --disable-werror. which is as
lenient a built criteria one can have.
that is  1 in 5 commit to master did not even compile or link on
windows!!! during roughly the November 2014-May-2015 period.
For more details, breakage can last quite long: here are the longest
consecutive number of broken commits for that period (only >= 50 are
listed) out of 163 breakage with more than 1 consecutive commit broken
50
50
55
55
56
57
61
70
94
140
278

bearing in mind once again that these are compile-and-link only
build... real ci build (with werror and tests, fair much, much worse)







Norbert


More information about the LibreOffice mailing list