[Libreoffice] subsequenttest hang ...

Norbert Thiebaud nthiebaud at gmail.com
Mon Sep 26 12:16:44 PDT 2011


On Mon, Sep 26, 2011 at 1:46 PM, Stephan Bergmann <sbergman at redhat.com> wrote:
> On 09/26/2011 08:24 PM, Michael Meeks wrote:
>>
>>        Looks like an improvement to me, thanks for that. I wonder why we
>> only
>> see this now, surely smoketests have died in mid-flow before ?
>
> Not sure.  Maybe it was indeed always the case that cppunittester would hang
> should soffice.bin crash while in the BASIC smoke test code.

it is not new. the tendency of smoketest to hand was the reason it was
not run in tinderboxes.
At Stephan request, I have added subsequenttest (and therefore
smoketest) to one of my tinderbox a couple of week ago...
and Murphy did not disappoint :-)

> Tinderboxes need to handle non-terminating builds, anyway (think a
> non-terminating, say, idlc), so no need to address non-terminating tests
> specifically for them.

The problem is that the 'build' as a whole can have enormous
legitimate variation in time within a given box, not to mention
enormous variation from box to box.

For instance. a build (the make itselft, excluding make check) with
very good ccache hit typically take 13 minutes or so on my Linux
Tinderbox... but with a bad ccache ratio (like a change a particularly
pervasive header), that build time can climb to 55 minutes or so...
now with some activity on the box, that can easily double or triple...
all of that without indicating any problem at all.
Then at the other range of the scale a Fridrich 'Release build' takes
routinely 10+ hours and I'm no even talking about Windows....

Beside bug that lead to the actual building tools to loop are rare,
since the rate of change of these tools is very low compared to the
rate of change in the delivered product (one would hope :-) )

So I think the timeout safety should be in make test itself, and
possibly with a  disable-switch to allow for debugging. whether that
should be a the individual test level or at the top target level, I
have no strong opinion -- I think it might be much easier to come with
sane bound for individual test than for the whole thing, due to //ism
for instance --  but having such mechanism in place is very important.
a hung tinderbox will not tell you it is hung... so from everybody's
perspective a hung tinderbox is as silent as a happy green one...
which means that a long time can lapsed before someone notice that
something is broken... which means that many patch, some of them
possibly broking something else, may accumulated... making the
untangling of the situation even harder...

Norbert


More information about the LibreOffice mailing list