<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:petri.latvala@intel.com" title="Petri Latvala <petri.latvala@intel.com>"> <span class="fn">Petri Latvala</span></a>
</span> changed
<a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][DRMTIP] igt@ - incomplete - Jenkins gives up"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111747">bug 111747</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Component</td>
<td>IGT
</td>
<td>DRM/Intel
</td>
</tr>
<tr>
<td style="text-align:right;">Assignee</td>
<td>dri-devel@lists.freedesktop.org
</td>
<td>intel-gfx-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<td style="text-align:right;">Priority</td>
<td>not set
</td>
<td>medium
</td>
</tr>
<tr>
<td style="text-align:right;">Severity</td>
<td>not set
</td>
<td>normal
</td>
</tr>
<tr>
<td style="text-align:right;">QA Contact</td>
<td>
</td>
<td>intel-gfx-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<td style="text-align:right;">i915 features</td>
<td>GEM/Other
</td>
<td>CI Infra
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][DRMTIP] igt@ - incomplete - Jenkins gives up"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111747#c15">Comment # 15</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][DRMTIP] igt@ - incomplete - Jenkins gives up"
href="https://bugs.freedesktop.org/show_bug.cgi?id=111747">bug 111747</a>
from <span class="vcard"><a class="email" href="mailto:petri.latvala@intel.com" title="Petri Latvala <petri.latvala@intel.com>"> <span class="fn">Petri Latvala</span></a>
</span></b>
<pre>Happens to TGL in 5 / 16 runs (31.2%), last seen in: the previous build.
(I mention TGL since this bug seems to be for the TGL occurrences but it can
happen to any machine)
User impact for this issue in particular is N/A since it's a CI issue. However,
having incompletes reduces the coverage for any test that doesn't get run due
to this so potentially very dire. It doesn't happen at 100% regularity though,
and happens for arbitrary tests so coverage loss is not entirely up to the
potential cap.
What happens here is
1) Jenkins connects to DUT through ssh and launches tests
2) Jenkins loses ssh connection
3) The Jenkins job for executing the test finishes, because the ssh command
completed
4) At the end of finishing a test, a reboot-and-collect job is executed
5) The reboot-and-collect job connects through ssh and reboots the machine
The remote reboot job got a logging step added, tests that die due to the
reboot command prematurely invoked get a log entry in dmesg stating power.sh is
taking this machine down. From that we can determine that network didn't
completely die, just the ssh connection.
There is a plan to solve this. igt_runner will be changed to expose an AF_LOCAL
socket for outside control, and the Jenkins job for executing tests will then
no longer be required to maintain an ssh connection active for the duration of
the whole test round. Instead tests will be launched in the background (with
screen or tmux or just nohup) and the Jenkins job will reconnect the ssh
connection when/if it fails and check through igt_runner's control channel if a
test is still running.
Moving this bug to CI infra.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>