More information about hung Jenkins builds

Stephan Bergmann sbergman at redhat.com
Tue Jun 30 09:38:27 UTC 2020


On 19/06/2020 14:51, Stephan Bergmann wrote:
> On 28/05/2020 22:19, Stephan Bergmann wrote:
>> For now, I have updated 
>> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use 
>> the new kill-wrapper timeout feature instead of Jenkins' "Abort the 
>> build if it's stuck" option.  (And am planning to roll it out to other 
>> Linux Jenkins jobs that could benefit from it, once it has proven 
>> sufficiently stable.)
> 
> I have rolled out the kill-wrapper and its timeout feature now also for 
> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil_branch/>, 
> <https://ci.libreoffice.org/job/gerrit_linux_gcc_release/>, and 
> <https://ci.libreoffice.org/job/lo_ubsan/>.

Just to note down the semi-obvious somewhere:  One scenario that 
kill-wrapper apparently doesn't prevent is leftover processes after 
Jenkins "has lost the connection" (for whatever reason, maybe a bug in 
Jenkins itself?).

<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62736/> had 
gone down with

[...]
> [build JUT] linguistic_unoapi
> FATAL: command execution failed
> java.io.EOFException
> 	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
> 	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
> 	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
> 	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
> 	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
> 	at hudson.remoting.Command.readFrom(Command.java:142)
> 	at hudson.remoting.Command.readFrom(Command.java:128)
> 	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
> 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
> Caused: java.io.IOException: Unexpected termination of the channel
> 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
> Caused: java.io.IOException: Backing channel 'tb75-lilith' is disconnected.
> 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
> 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
> 	at com.sun.proxy.$Proxy66.isAlive(Unknown Source)
> 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1147)
> 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1139)
> 	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
> 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
> 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
> 	at hudson.model.Build$BuildExecution.build(Build.java:206)
> 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
> 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
> 	at hudson.model.Run.execute(Run.java:1880)
> 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> 	at hudson.model.ResourceController.execute(ResourceController.java:97)
> 	at hudson.model.Executor.run(Executor.java:428)
> FATAL: Unable to delete script file /tmp/jenkins3180341342272089625.sh
> java.io.EOFException
> 	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
> 	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
> 	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
> 	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
> 	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
> 	at hudson.remoting.Command.readFrom(Command.java:142)
> 	at hudson.remoting.Command.readFrom(Command.java:128)
> 	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
> 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
> Caused: java.io.IOException: Unexpected termination of the channel
> 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
> Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel at 629ec1e9:tb75-lilith": Remote call on tb75-lilith failed. The channel is closing down or has closed down
> 	at hudson.remoting.Channel.call(Channel.java:991)
> 	at hudson.FilePath.act(FilePath.java:1069)
> 	at hudson.FilePath.act(FilePath.java:1058)
> 	at hudson.FilePath.delete(FilePath.java:1543)
> 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
> 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
> 	at hudson.model.Build$BuildExecution.build(Build.java:206)
> 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
> 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
> 	at hudson.model.Run.execute(Run.java:1880)
> 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> 	at hudson.model.ResourceController.execute(ResourceController.java:97)
> 	at hudson.model.Executor.run(Executor.java:428)
> Build step 'Execute shell' marked build as failure
> Finished: FAILURE

leaving behind some pstree forest of

> oosplash─┬─soffice.bin─┬─soffice.bin
>          │             └─182*[{soffice.bin}]
>          └─{oosplash}
> 
> sh───sh───python.bin─┬─oosplash─┬─soffice.bin─┬─soffice.bin
>                      │          │             └─294*[{soffice.bin}]
>                      │          └─{oosplash}
>                      └─2*[{python.bin}]
> 
> sh───sh───python.bin───oosplash
> 
> sh───sh───gdb-core-bt.sh───gdb
> 
> sh───sh───python.bin───oosplash

on tb75, where each of those processes belonged to the above build as 
demonstrated with a respective

> $ cat /proc/$PID/environ | tr '\0' '\n' | grep BUILD_NUMBER
> BUILD_NUMBER=62736

That caused later builds like 
<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62758/> on 
tb75 to fail with "the test UITest_calc_demo failed".



More information about the LibreOffice mailing list