<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_ASSIGNED "
title="ASSIGNED - [bsw] execlists causes machine lockups"
href="https://bugs.freedesktop.org/show_bug.cgi?id=93467#c13">Comment # 13</a>
on <a class="bz_bug_link
bz_status_ASSIGNED "
title="ASSIGNED - [bsw] execlists causes machine lockups"
href="https://bugs.freedesktop.org/show_bug.cgi?id=93467">bug 93467</a>
from <span class="vcard"><a class="email" href="mailto:tvrtko.ursulin@linux.intel.com" title="Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>"> <span class="fn">Tvrtko Ursulin</span></a>
</span></b>
<pre>On my BDW I need a kernel with no debugging whatsoever to trigger this
reliably. No tracing, no lockdep, even basic spinlock debugging needs to be
turned off.
In that setup gem_exec_nop/basic generates ~340k interrupts per second and
creates numerous 10-20 second system-wide stalls.
I've tried to measure the durations of various sections of the code in
intel_lrc.c, and although the averages and maximums are bad (if my numbers are
correct we can spend 10-25% of elapsed test time with interrupts off), I can't
find anything which would block for 10-20 seconds in one go.
And by inspecting the code I also can't figure out how it would happen.
Also, these lockups in general seem to come and go in batches. Sometimes the
system is happily chugging along with 340k irq/s, and sometimes it is stalling
all the time. What makes it latch to one of these modes I have no idea.
At one point I thought I see a correlation with the retire worker, but then it
went away. It still feels upgrading the trylock there with a real lock improves
things (more short lockups vs long ones), but I can't figure out why that would
make sense.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>