<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a>
</span> changed
<a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][SHARDS] igt@i915_selftest@live_hangcheck - dmesg-fail - igt_atomic_reset_engine timed out, cancelling test."
href="https://bugs.freedesktop.org/show_bug.cgi?id=110429">bug 110429</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Priority</td>
<td>high
</td>
<td>medium
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][SHARDS] igt@i915_selftest@live_hangcheck - dmesg-fail - igt_atomic_reset_engine timed out, cancelling test."
href="https://bugs.freedesktop.org/show_bug.cgi?id=110429#c5">Comment # 5</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [CI][SHARDS] igt@i915_selftest@live_hangcheck - dmesg-fail - igt_atomic_reset_engine timed out, cancelling test."
href="https://bugs.freedesktop.org/show_bug.cgi?id=110429">bug 110429</a>
from <span class="vcard"><a class="email" href="mailto:martin.peres@free.fr" title="Martin Peres <martin.peres@free.fr>"> <span class="fn">Martin Peres</span></a>
</span></b>
<pre>(In reply to Chris Wilson from <a href="show_bug.cgi?id=110429#c4">comment #4</a>)
<span class="quote">> (In reply to Arek Hiler from <a href="show_bug.cgi?id=110429#c3">comment #3</a>)
> > (In reply to Francesco Balestrieri from <a href="show_bug.cgi?id=110429#c2">comment #2</a>)
> > > From:
> > >
> > > <3> [2204.524458] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset
> > > request timeout
> > >
> > > It seems that the HW failed to respond and the test timed out. We should
> > > increase the timeout of the test to get the actual failure.
> >
> > Was there anything done already? Did we increase the timeout? The issue was
> > seen only once in CI_DRM_5922.
>
> The error has occurred quite rarely across the reset tests over the years,
> and over the years we have applied whatever w/a we could find to reduce the
> rate of incidence. We haven't increased the timeout applied to the selftest
> yet -- and it would not fix the bug, just make the cause more obvious.</span >
So this failure has been twice, once on CFL and once on ICL.
I think your explanation makes sense, and we should try to reduce the
reproduction rate as much as possible, but this does not look like a new
regression, more like an architectural issue.
Dropping the priority to medium so as we can periodically check that this does
not become more apparent.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
<li>You are the QA Contact for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>