[Intel-gfx] [PATCH 0/3] Per Engine hang detection and recovery
Siluvery, Arun
arun.siluvery at intel.com
Mon Nov 11 16:49:01 CET 2013
On Mon, 2013-11-11 at 16:31 +0100, Daniel Vetter wrote:
> On Mon, Nov 11, 2013 at 02:58:31PM +0000, Siluvery, Arun wrote:
> > From: "Siluvery, Arun" <arun.siluvery at intel.com>
> >
> > This patchset contains changes for Timeout detection and recovery (TDR) which
> > provides per-engine hang detection and recovery.
> > The current driver performs full gpu reset in case of a hang, TDR attempts to
> > only reset the engine that is hung and it falls back to full reset if it fails.
> >
> > Full GPU reset can leave the system in a state where the display updates
> > intermittently and possibly lock-up depending on the work load at the time of
> > hang. TDR can help recover the system in those case thus increasing the stability.
>
> Are these hw lockups you've seen with full gpu reset or just kernel
> deadlocks? If it's the latter we've recently (re-)fixed a bunch of those,
> and if there are new ones we definitely want to fix them and add testcases
> to igt. So if you could share some of these hangs and their
> analysis/testcases that's be very interesting.
>
> That's of course on top of any other reset improvements.
I think these are kernel lockups, unfortunately when this happens there
is no response from the kernel, sending break is also not helping. I
will try to get more details on this.
>
> > The changes are split in multiple patches.
> > 1. Ring utility functions to save/restore context, reset ring etc
> > 2. TDR hang detection logic and error recovery function
> > 3. Debugfs changes to export TDR statistics.
> >
> > I have tested these changes on drm-intel-nightly with simple test which
> > inserts a bad batch buffer on the specific to trigger a hang. TDR logic
> > then detects this and recovers from it by skipping the bad batch.
>
> I want this testcase (as a patch to igt).
ok, I will send it to the mailing list.
>
> > Please review and give your comments.
>
> I'll try to have a look later this week, atm still busy with bdw
> upstreaming. One more meta-comment though: Something with your git setup
> seems to be broken, the patches don't have in-reply-to headers pointing at
> this cover letter and hence the threading is a bit broken.
ok thanks.
yes my mistake I missed an option while generating the patches.
Do you suggest resending all patches again?
>
> Cheers, Daniel
> >
> > regards
> > Arun
> >
> > Siluvery, Arun (3):
> > drm/1915: Add ring functions to save/restore context for per-ring
> > reset
> > drm/i915: Per-engine Timeout detection and recovery on HSW
> > drm/i915: Export TDR hang count to debugfs
> >
> > drivers/gpu/drm/i915/i915_debugfs.c | 68 +++-
> > drivers/gpu/drm/i915/i915_dma.c | 16 +-
> > drivers/gpu/drm/i915/i915_drv.c | 195 +++++++++-
> > drivers/gpu/drm/i915/i915_drv.h | 92 ++++-
> > drivers/gpu/drm/i915/i915_gem.c | 77 +++-
> > drivers/gpu/drm/i915/i915_gpu_error.c | 25 +-
> > drivers/gpu/drm/i915/i915_irq.c | 556 ++++++++++++++++-------------
> > drivers/gpu/drm/i915/i915_reg.h | 7 +
> > drivers/gpu/drm/i915/intel_display.c | 25 +-
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 607 +++++++++++++++++++++++++++++++-
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 51 +++
> > drivers/gpu/drm/i915/intel_uncore.c | 31 +-
> > include/drm/drmP.h | 7 +
> > 13 files changed, 1467 insertions(+), 290 deletions(-)
> >
> > --
> > 1.8.4
> >
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
More information about the Intel-gfx
mailing list