test-infra proposal: master-tested branch (was: test infrastructure ideas appreciated ...)

Wed Jun 10 06:04:41 PDT 2015

Hi,

On Wed, Jun 03, 2015 at 02:33:23PM +0100, Michael Meeks wrote:
> 	Constructive thoughts appreciated in reply here.

Soo, looking at tests, regressions and infra for that, lets first have a look
at the "regression pipeline". There are four stages a regression goes through:

1/ regression gets introduced by a code change
2/ regression is found by manual or automatic testing
3/ regression is triaged by QA/bibisect etc.
4/ regression is fixed by development

Looking at those, we are doing quite well with 3/ these days: the last ESC
notes have 597 open regressions and 335 of those are bibisected, with the
number of bibisected open regressions slowly raising over time.

We are also doing quite well in general with 2/: We find critical issues
quickly and quick enough to fix them in time on the relevant branches of our
release schedule. We could be better at finding non-critical bugs, however
as we see in later stages (3/ and 4/), its not as if we run a risk of those
being depleted of work anytime soon.

On the other hand, both 1/ and 4/ seem to be our real pain points: We are still
creating too much regressions and are too slow at fixing them. As others have
noted these are mostly cultural problems, and not really easily attackable with
a infrastructure/throw-money-at-it approach.

However, there are _some_ things that can be improved and can be done so with
infra/money. While we cant have machines write good tests, we can have machines
_run_ the tests we have regularly in the first place. Looking at the
time-broken number from our tinderboxes, it is obvious that this isnt done at
all as much as we can.

Thanks to Norbert we have a very good test infrastructure, that allows us to
test each and every commit, if we desire to do so. In the least, it allows us
to test branches on all platforms before merge/rebase/cherry-pick (see: [1]).

As master has to build on all platforms and also pass all tests on all
platforms for it to really be useful, both of those scenarios are severely
hampered by master being broken far too often. If we want to make contributors
embrace test-driven development -- or at least work in the direction, one
essential prerequisite is a rocksolid base that has no false positives.

As such, here is one idea for infrastructure:
- Create a branch master-tested
- have some tinderboxes run a full build and all tests (best, say ten times)
  on all platforms (possibly for both dbgutil and non-dbgutil) on master
- once all platforms have build and tested a commit from master that way and it
  is all green, master-tested is forwarded to that commit
- nobody ever pushes or can push directly to master-tested, only to master
- the frequency/granularity of master-tested being updated does have to be
  super-fast: ~once daily if master is healthy should be enough

With that, both branches and individual commits have a known good base to build
upon and use our tests and CI to the fullest. The latter would work trivially
for developers using gerrit, e.g.:
 git checkout -b mywork master-tested
 git commit -m "did stuff"
 git push logerrit HEAD:refs/for/master
at which point Norberts CI would kick in and the cherry-pick would still go to
master once reviewed.

For feature branches it would work too, e.g.:
 git rebase master-tested
 git push logerrit HEAD:feature/foo
 git commit --allow-empty -c "testbuild of branch"
 git push logerrit HEAD:refs/for/feature/foo
 git reset --hard HEAD^ # delete the noop commit
And the CI would happily build and test the branch on a known good base.

Norberts CI is _very_ helpful for individual commits and feature branches, _if_
master is stable. I use it in the latter way quite regularly.
- Once master was in a good state on all platforms by chance and it helped me
  find a problem on OSX before pushing to master.
- Once master was in a bad state, which I knew as I found out when I ran 'make
  check' on every commit of my branch as per [1]. The OSX build failed, but in
  a very similar way as the already broken master. As such, I gave up in
  frustration[2] and pushed to master as-is ... which stacked another
  (platformdependant) breakage on top of the existing instability and made life
  a misery for everyone else giving an honest try at using tests and CI. What a
  vicious cycle.

Having master-tested might help us a lot in having more of the first and less
of the second.

Best,

Bjoern

[1] https://skyfromme.wordpress.com/2015/05/26/death-or-glory-vs-continuous-integration/
[2] https://gerrit.libreoffice.org/#/c/16179/