linux dbgutil tinderbox stuck -> backtrace

Fri Apr 1 11:22:18 UTC 2016

On Fri, Apr 1, 2016 at 2:59 AM, Stephan Bergmann <sbergman at redhat.com> wrote:
> On 03/31/2016 03:17 PM, Norbert Thiebaud wrote:
>>
>> On Thu, Mar 31, 2016 at 7:59 AM, Michael Stahl <mstahl at redhat.com> wrote:
>>>
>>> it's a pretty rare deadlock, i've hit it once and sberg too once AFAIK.
>
>
> Ah, <https://bugs.documentfoundation.org/show_bug.cgi?id=96387> "deadlock in
> HSQLDB" predates
> <https://cgit.freedesktop.org/libreoffice/core/commit/?id=03a271901c39d60e4519e67e258d565ad5e1e085>
> "Guard against globally shared UNO ref accessed from wrong UNO env", which
> was the only change that came to mind when I saw Norbert's original post.
>
> I've started to run into this a couple of times now, too.  But at least the
> one time I was alert enough to run jstack on the deadlocked process, all it
> gave me was an internal failure in jstack.
>
>> What I really wish for is a reliable hard timeout on all these tests.
>
>
> I think a better approach would be to let the bots do containerized builds
> that get automatically killed

2 problems:
1/ we _do_ have such global level deadlock.. but jenkins being java...
these are unreliable (jenkins plugin that provide that feature even
explain in details that it is unreliable)...
2/ the linux debug build, once a week also rebuild the doc.. which
takes a long time.. so I had to bump that global level deadlock to the
max time a full build + build the doc and upload it can take... which
make the deadlock kick in the 5-6 hours range... not great.

> [...] or passed on to someone who can debug the problem, or...

There is a flow of task to do.. I cannot block a slave for an
undetermined amount of time waiting for someone to take a look.... or
the build queue piles up...
what is needed is hard timeout.. and preferably with automatic
generation of the useful and relevant diag info. the later make
per-test timeout more useful since then the watchdog can try to first
gather diag of the running test... which will depend on the nature of
the test...

Norbert

Norbert