More information about hung Jenkins builds
Stephan Bergmann
sbergman at redhat.com
Tue Jun 30 09:38:27 UTC 2020
On 19/06/2020 14:51, Stephan Bergmann wrote:
> On 28/05/2020 22:19, Stephan Bergmann wrote:
>> For now, I have updated
>> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use
>> the new kill-wrapper timeout feature instead of Jenkins' "Abort the
>> build if it's stuck" option. (And am planning to roll it out to other
>> Linux Jenkins jobs that could benefit from it, once it has proven
>> sufficiently stable.)
>
> I have rolled out the kill-wrapper and its timeout feature now also for
> <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil_branch/>,
> <https://ci.libreoffice.org/job/gerrit_linux_gcc_release/>, and
> <https://ci.libreoffice.org/job/lo_ubsan/>.
Just to note down the semi-obvious somewhere: One scenario that
kill-wrapper apparently doesn't prevent is leftover processes after
Jenkins "has lost the connection" (for whatever reason, maybe a bug in
Jenkins itself?).
<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62736/> had
gone down with
[...]
> [build JUT] linguistic_unoapi
> FATAL: command execution failed
> java.io.EOFException
> at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
> at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
> at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
> at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
> at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
> at hudson.remoting.Command.readFrom(Command.java:142)
> at hudson.remoting.Command.readFrom(Command.java:128)
> at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
> at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
> Caused: java.io.IOException: Unexpected termination of the channel
> at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
> Caused: java.io.IOException: Backing channel 'tb75-lilith' is disconnected.
> at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
> at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
> at com.sun.proxy.$Proxy66.isAlive(Unknown Source)
> at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1147)
> at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1139)
> at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
> at hudson.model.Build$BuildExecution.build(Build.java:206)
> at hudson.model.Build$BuildExecution.doRun(Build.java:163)
> at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
> at hudson.model.Run.execute(Run.java:1880)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:428)
> FATAL: Unable to delete script file /tmp/jenkins3180341342272089625.sh
> java.io.EOFException
> at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
> at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
> at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
> at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
> at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
> at hudson.remoting.Command.readFrom(Command.java:142)
> at hudson.remoting.Command.readFrom(Command.java:128)
> at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
> at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
> Caused: java.io.IOException: Unexpected termination of the channel
> at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
> Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel at 629ec1e9:tb75-lilith": Remote call on tb75-lilith failed. The channel is closing down or has closed down
> at hudson.remoting.Channel.call(Channel.java:991)
> at hudson.FilePath.act(FilePath.java:1069)
> at hudson.FilePath.act(FilePath.java:1058)
> at hudson.FilePath.delete(FilePath.java:1543)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
> at hudson.model.Build$BuildExecution.build(Build.java:206)
> at hudson.model.Build$BuildExecution.doRun(Build.java:163)
> at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
> at hudson.model.Run.execute(Run.java:1880)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:428)
> Build step 'Execute shell' marked build as failure
> Finished: FAILURE
leaving behind some pstree forest of
> oosplash─┬─soffice.bin─┬─soffice.bin
> │ └─182*[{soffice.bin}]
> └─{oosplash}
>
> sh───sh───python.bin─┬─oosplash─┬─soffice.bin─┬─soffice.bin
> │ │ └─294*[{soffice.bin}]
> │ └─{oosplash}
> └─2*[{python.bin}]
>
> sh───sh───python.bin───oosplash
>
> sh───sh───gdb-core-bt.sh───gdb
>
> sh───sh───python.bin───oosplash
on tb75, where each of those processes belonged to the above build as
demonstrated with a respective
> $ cat /proc/$PID/environ | tr '\0' '\n' | grep BUILD_NUMBER
> BUILD_NUMBER=62736
That caused later builds like
<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62758/> on
tb75 to fail with "the test UITest_calc_demo failed".
More information about the LibreOffice
mailing list