[Intel-gfx] [PATCH 0/5] robustify reset state transitions
Daniel Vetter
daniel.vetter at ffwll.ch
Mon Nov 12 23:07:48 CET 2012
Hi all,
So I've noticed again that the hangman test was failing on some machines here,
and tracked it down to the new lockless wait code. Closer inspection showed that
we've relied on the single dev->struct_mutex ordering things correctly between
waiters and the reset code. But with that lock grabbing gone, the entire reset
could happen before the waiter wakes up and hence the waiter never sees a
non-zeor wedged value. Which means it'll go right back to sleep, waiting for a
seqno which just go cleared out by the reset code.
Looking at the code I've declared the entire thing to ad-hoc and revamped it,
adding comments explaining what's going on all over the place and auditing for
tiny races everywhere. Hopefully I've caugth them all, at least the machines
that previously hung after reset are now happily going through a few hundres
reset cycles!
Comments, flames and especially review highly welcome.
For fun (hey, let me have it!) I've thrown in some "let's move stuff around a
bit" patches at the beginning ;-)
Cheers, Daniel
Daniel Vetter (5):
drm/i915: move dev_priv->mm out of line
drm/i915: extract hangcheck/reset/error_state state into substruct
drm/i915: move wedged to the other gpu error handling stuff
drm/i915: clear up wedged transitions
drm/i915: create a race-free reset detection
drivers/gpu/drm/i915/i915_debugfs.c | 12 +-
drivers/gpu/drm/i915/i915_dma.c | 9 +-
drivers/gpu/drm/i915/i915_drv.c | 8 +-
drivers/gpu/drm/i915/i915_drv.h | 274 ++++++++++++++++++--------------
drivers/gpu/drm/i915/i915_gem.c | 110 +++++++------
drivers/gpu/drm/i915/i915_irq.c | 89 +++++++----
drivers/gpu/drm/i915/intel_display.c | 4 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 8 +-
8 files changed, 297 insertions(+), 217 deletions(-)
--
1.7.11.4
More information about the Intel-gfx
mailing list