More information about hung Jenkins builds

Stephan Bergmann sbergman at redhat.com
Thu Sep 3 07:49:04 UTC 2020


On 30/06/2020 11:38, Stephan Bergmann wrote:
> On 19/06/2020 14:51, Stephan Bergmann wrote:
>> On 28/05/2020 22:19, Stephan Bergmann wrote:
>>> For now, I have updated 
>>> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use 
>>> the new kill-wrapper timeout feature instead of Jenkins' "Abort the 
>>> build if it's stuck" option.  (And am planning to roll it out to 
>>> other Linux Jenkins jobs that could benefit from it, once it has 
>>> proven sufficiently stable.)
>>
>> I have rolled out the kill-wrapper and its timeout feature now also 
>> for 
>> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil_branch/>, 
>> <https://ci.libreoffice.org/job/gerrit_linux_gcc_release/>, and 
>> <https://ci.libreoffice.org/job/lo_ubsan/>.
> 
> Just to note down the semi-obvious somewhere:  One scenario that 
> kill-wrapper apparently doesn't prevent is leftover processes after 
> Jenkins "has lost the connection" (for whatever reason, maybe a bug in 
> Jenkins itself?).
> 
> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62736/> had 
> gone down with
> 
> [...]
>> [build JUT] linguistic_unoapi
>> FATAL: command execution failed
>> java.io.EOFException
>>     at 
>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738) 
[...]

That issue now hit again on tb79, where 
<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/67758/> "lost 
the connection" and left behind zombies that then broke later builds. 
(And which I manually killed now.)

I don't know how such lost connection issues get fixed, do they 
magically self-heal within the Jenkins framework, or does it involve 
manual intervention?  If the latter, would it be possible to include a 
step that removes such leftover zombie processes?



More information about the LibreOffice mailing list