[Mesa-dev] Possible Sandybridge GPU hang fixes

Kenneth Graunke kenneth at whitecape.org
Mon Oct 28 18:44:19 CET 2013


On 10/27/2013 07:24 AM, Daniel Vetter wrote:
> On Sat, Oct 26, 2013 at 01:07:48PM -0700, Kenneth Graunke wrote:
>> These patches add some missing flushing, which appears to help.  I'm
>> still getting GPU hangs, but they're much less frequent, and now have
>> an IPEHR of MI_SEMAPHORE_MBOX.  I suspect those may be due to bugs in
>> my performance monitoring code, rather than upstream problems.
> 
> The new bug is https://bugs.freedesktop.org/show_bug.cgi?id=54226

Oh, fantastic!  I'm glad it's a known issue and not me making things
worse... :) Thanks for the pointer.

> We can
> work around that by disabling semaphores, but that tends to bring the
> slightly non-coherent seqno write/irq signalling to light again. Note that
> there's been a bunch of kernels that failed to correctly detect the missed
> interrupt stuff and instead just stalled for one second.
> 
> Since both bugs can be fixed up by just unblocking the rings (after the
> kernel check that indeed semaphores signals/irqs have been lost) and don't
> require a full gpu reset we've somewhat stopped bothering.
>
> If these bugs get in the way of your testing you can clear the error state
> by just writing something to the debugfs/sysfs file, making way for the
> next gpu hang.

Yep, that's a really useful feature.

>> Xinkai Chen reported that DOTA 2 used to hang every 8-10 minutes, but
>> after applying this patch series, it had not hung after 3 hours.
> 
> Awesome! Afaics the snb blorp death is currently the most often reported
> dupe bug we see for gpu hangs reported against the kernel ;-)
> 
>> I'm not sure if these should go to stable or not.  Probably, but adding
>> more flushes could introduce hangs just as easily as it could fix them
>> (at least on Sandybridge), so I'm always nervous about that.
> 
> Most reports I've seen fly by mentioned that stability markedly reduce
> when upgrading from mesa 9.1 to 9.2. And these regressions seem to happen
> with lots of different gl apps. So imo this should go to stable, but maybe
> after an extended soaking time.
> 
> Cheers, Daniel

Sounds like a plan.  Thanks, Daniel.


More information about the mesa-dev mailing list