[Mesa-dev] Possible Sandybridge GPU hang fixes
daniel at ffwll.ch
Sun Oct 27 15:24:28 CET 2013
On Sat, Oct 26, 2013 at 01:07:48PM -0700, Kenneth Graunke wrote:
> These patches add some missing flushing, which appears to help. I'm
> still getting GPU hangs, but they're much less frequent, and now have
> an IPEHR of MI_SEMAPHORE_MBOX. I suspect those may be due to bugs in
> my performance monitoring code, rather than upstream problems.
The new bug is https://bugs.freedesktop.org/show_bug.cgi?id=54226 We can
work around that by disabling semaphores, but that tends to bring the
slightly non-coherent seqno write/irq signalling to light again. Note that
there's been a bunch of kernels that failed to correctly detect the missed
interrupt stuff and instead just stalled for one second.
Since both bugs can be fixed up by just unblocking the rings (after the
kernel check that indeed semaphores signals/irqs have been lost) and don't
require a full gpu reset we've somewhat stopped bothering.
If these bugs get in the way of your testing you can clear the error state
by just writing something to the debugfs/sysfs file, making way for the
next gpu hang.
> Xinkai Chen reported that DOTA 2 used to hang every 8-10 minutes, but
> after applying this patch series, it had not hung after 3 hours.
Awesome! Afaics the snb blorp death is currently the most often reported
dupe bug we see for gpu hangs reported against the kernel ;-)
> I'm not sure if these should go to stable or not. Probably, but adding
> more flushes could introduce hangs just as easily as it could fix them
> (at least on Sandybridge), so I'm always nervous about that.
Most reports I've seen fly by mentioned that stability markedly reduce
when upgrading from mesa 9.1 to 9.2. And these regressions seem to happen
with lots of different gl apps. So imo this should go to stable, but maybe
after an extended soaking time.
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the mesa-dev