<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:chris@chris-wilson.co.uk" title="Chris Wilson <chris@chris-wilson.co.uk>"> <span class="fn">Chris Wilson</span></a>
</span> changed
<a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - [CI] igt@drv_selftest@live_hangcheck - igt_reset_engines failed with error"
href="https://bugs.freedesktop.org/show_bug.cgi?id=106560">bug 106560</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Resolution</td>
<td>---
</td>
<td>FIXED
</td>
</tr>
<tr>
<td style="text-align:right;">Status</td>
<td>NEW
</td>
<td>RESOLVED
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - [CI] igt@drv_selftest@live_hangcheck - igt_reset_engines failed with error"
href="https://bugs.freedesktop.org/show_bug.cgi?id=106560#c2">Comment # 2</a>
on <a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - [CI] igt@drv_selftest@live_hangcheck - igt_reset_engines failed with error"
href="https://bugs.freedesktop.org/show_bug.cgi?id=106560">bug 106560</a>
from <span class="vcard"><a class="email" href="mailto:chris@chris-wilson.co.uk" title="Chris Wilson <chris@chris-wilson.co.uk>"> <span class="fn">Chris Wilson</span></a>
</span></b>
<pre>Hopefully,
commit 9a4dc80399b1630cea0f1ad8ef0417436cbb95d0
Author: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>>
Date: Fri May 18 11:09:33 2018 +0100
drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
Inside the live_hangcheck (reset) selftests, we occasionally see
failures like
<7>[ 239.094840] i915_gem_set_wedged rcs0
<7>[ 239.094843] i915_gem_set_wedged current seqno 19a98, last 19a9a,
hangcheck 0 [5158 ms]
<7>[ 239.094846] i915_gem_set_wedged Reset count: 6239 (global 1)
<7>[ 239.094848] i915_gem_set_wedged Requests:
<7>[ 239.095052] i915_gem_set_wedged first 19a99 [e8c:5f]
prio=1024 @ 5159ms: (null)
<7>[ 239.095056] i915_gem_set_wedged last 19a9a [e81:1a]
prio=139 @ 5159ms: igt/rcs0[5977]/1
<7>[ 239.095059] i915_gem_set_wedged active 19a99 [e8c:5f]
prio=1024 @ 5159ms: (null)
<7>[ 239.095062] i915_gem_set_wedged [head 0220, postfix 0280,
tail 02a8, batch 0xffffffff_ffffffff]
<7>[ 239.100050] i915_gem_set_wedged ring->start: 0x00283000
<7>[ 239.100053] i915_gem_set_wedged ring->head: 0x000001f8
<7>[ 239.100055] i915_gem_set_wedged ring->tail: 0x000002a8
<7>[ 239.100057] i915_gem_set_wedged ring->emit: 0x000002a8
<7>[ 239.100059] i915_gem_set_wedged ring->space: 0x00000f10
<7>[ 239.100085] i915_gem_set_wedged RING_START: 0x00283000
<7>[ 239.100088] i915_gem_set_wedged RING_HEAD: 0x00000260
<7>[ 239.100091] i915_gem_set_wedged RING_TAIL: 0x000002a8
<7>[ 239.100094] i915_gem_set_wedged RING_CTL: 0x00000001
<7>[ 239.100097] i915_gem_set_wedged RING_MODE: 0x00000300 [idle]
<7>[ 239.100100] i915_gem_set_wedged RING_IMR: fffffefe
<7>[ 239.100104] i915_gem_set_wedged ACTHD: 0x00000000_0000609c
<7>[ 239.100108] i915_gem_set_wedged BBADDR: 0x00000000_0000609d
<7>[ 239.100111] i915_gem_set_wedged DMA_FADDR: 0x00000000_00283260
<7>[ 239.100114] i915_gem_set_wedged IPEIR: 0x00000000
<7>[ 239.100117] i915_gem_set_wedged IPEHR: 0x02800000
<7>[ 239.100120] i915_gem_set_wedged Execlist status: 0x00044052
00000002
<7>[ 239.100124] i915_gem_set_wedged Execlist CSB read 5 [5 cached],
write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled)
<7>[ 239.100128] i915_gem_set_wedged ELSP[0] count=1,
ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
<7>[ 239.100132] i915_gem_set_wedged ELSP[1] count=1,
ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
<7>[ 239.100135] i915_gem_set_wedged HW active? 0x5
<7>[ 239.100250] i915_gem_set_wedged E 19a99 [e8c:5f] prio=1024
@ 5164ms: (null)
<7>[ 239.100338] i915_gem_set_wedged E 19a9a [e81:1a] prio=139 @
5164ms: igt/rcs0[5977]/1
<7>[ 239.100340] i915_gem_set_wedged Queue priority: 139
<7>[ 239.100343] i915_gem_set_wedged Q 0 [e98:19] prio=132 @
5164ms: igt/rcs0[5977]/8
<7>[ 239.100346] i915_gem_set_wedged Q 0 [e84:19] prio=121 @
5165ms: igt/rcs0[5977]/2
<7>[ 239.100349] i915_gem_set_wedged Q 0 [e87:19] prio=82 @
5165ms: igt/rcs0[5977]/3
<7>[ 239.100352] i915_gem_set_wedged Q 0 [e84:1a] prio=44 @
5164ms: igt/rcs0[5977]/2
<7>[ 239.100356] i915_gem_set_wedged Q 0 [e8b:19] prio=20 @
5165ms: igt/rcs0[5977]/4
<7>[ 239.100362] i915_gem_set_wedged drv_selftest [5894] waiting for
19a99
where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET!
The RING_MODE indicates that is idle and has the STOP_RING bit set, so
try clearing it.
v2: Only clear the bit on restarting the ring, as we want to be sure the
STOP_RING bit is kept if reset fails on wedging.
v3: Spot when the ring state doesn't make sense when re-initialising the
engine and dump it to the logs so that we don't have to wait for an
error later and try to guess what happened earlier.
v4: Prepare to print all the unexpected state, not just the first.
Signed-off-by: Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>>
Cc: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@linux.intel.com">tvrtko.ursulin@linux.intel.com</a>>
Reviewed-by: Tvrtko Ursulin <<a href="mailto:tvrtko.ursulin@intel.com">tvrtko.ursulin@intel.com</a>>
Link:
<a href="https://patchwork.freedesktop.org/patch/msgid/20180518100933.2239-1-chris@chris-wilson.co.uk">https://patchwork.freedesktop.org/patch/msgid/20180518100933.2239-1-chris@chris-wilson.co.uk</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>