[Intel-gfx] [RFC 00/13] TDR and Watchdog Reset

Daniel Vetter daniel at ffwll.ch
Mon Dec 16 17:35:58 CET 2013


On Mon, Dec 16, 2013 at 04:02:20PM +0000, Lister, Ian wrote:
> This patchset contains TDR and Watchdog reset against a 3.13
> drm-intel-nightly tree that is now about 2 weeks old.
> 
> I have re-worked the TDR and Watchdog Reset features to integrate
> them more closely with the existing TDR and scoring mechanism.
> 
> This is still a work-in-progress and I am currently debugging a couple
> of issues but I would like to get some early feedback.
> 
> Thanks,
> Ian.
> 
> From f71a7de85e9d81be3aa3962c8fe2557235ff21c1 Mon Sep 17 00:00:00 2001
> Message-Id: <cover.1387201899.git.ian.lister at intel.com>
> From: ian-lister <ian.lister at intel.com>
> Date: Mon, 16 Dec 2013 13:51:39 +0000
> Subject: [RFC 00/13] TDR and Watchdog Reset

I think I've spoken too soon when I've told you that the cover letter
looks sane. You need to send this out with git send-email since that will
then use the mail headers above as the real headers when transmitting the
patches over smpt. If you just paste the patches as-is into the mail
yourself these headers will be part of the commit message.

Also, since the Message-Id: doesn't match what git assumed when generating
the patches the threading is all broken, i.e. the actual pachtes should
look like replies to the cover letter. It's the default if you have
working git format-patch and git send-email.

Finally your git author field needs a bit polish. Oscar's i-g-t patches
have this all working correctly, so probably best if he does a quick
session with all the vpg london guys about how to set things up.

I'll look at the patches later on, should have some time tomorrow.

Cheers, Daniel

> 
> This patchset adds support for per-engine timeout detection and recovery
> and adds batch specific watchdog reset.
> 
> Per-ring TDR
> The detection logic has been modified to detect hangs on individual
> engines and pass this information through the to the recovery handler.
> Rather than a global reset it will attempt a per-engine reset. The
> registers associated with the ring are saved and restored so that
> when the ring restarts it continues from the next instruction in the
> ring. For example, if it was executing an MI_START_BATCH_BUFFER command
> it will advance to the next instruction which is likely to be the
> mailbox updates and user interrupt. This means that no extra effort
> is required to deal with synchronisation. From the perspective of the
> driver it looks like the batch buffer completed normally as all the
> normal signalling will take place, however the context stats will
> have been updated to flag up the guilty context.
> 
> Watchdog Reset
> This is requested via flags to the batch buffer submission IOCTL.
> It is currently only supported for the render and video rings.
> The batch buffer command is surrounded by a hardware timer start
> command and stop command. If the batch completes before the timer
> expires then the timer is cancelled and no interrupt is generated
> so everything continues normally. However if the batch hangs then
> the timer will generate an interrupt and it will trigger an engine
> reset. This feature requires per-ring TDR to do the recovery work.
> 
> ian-lister (13):
>   drm/i915: Periodic sampling for hang detection
>   drm/i915: Improved hang detection logic
>   drm/i915: Additional ring operations for TDR
>   drm/i915: Force wake restore for TDR
>   drm/i915: Per-engine recovery
>   drm/i915: Communicating reset requests
>   drm/i915: Additional debug for TDR
>   drm/i915: TDR loose ends
>   drm/i915: Watchdog timer support functions
>   drm/i915: MI_LOAD_REGISTER_IMM fix
>   drm/i915: Added watchdog interrupt handling
>   drm/i915: Enabled watchdog timer interrupts
>   drm/i915: Exec buffer inserts watchdog commands
> 
>  drivers/gpu/drm/i915/i915_debugfs.c        |  67 ++++
>  drivers/gpu/drm/i915/i915_dma.c            |   3 +
>  drivers/gpu/drm/i915/i915_drv.c            |  46 +++
>  drivers/gpu/drm/i915/i915_drv.h            |  41 ++-
>  drivers/gpu/drm/i915/i915_gem.c            |  26 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  30 +-
>  drivers/gpu/drm/i915/i915_irq.c            | 531
> ++++++++++++++++++---------
>  drivers/gpu/drm/i915/i915_reg.h            |  21 ++
>  drivers/gpu/drm/i915/intel_display.c       |  30 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c    | 557
> +++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  53 +++
>  drivers/gpu/drm/i915/intel_uncore.c        | 373 ++++++++++++++++++-
>  include/drm/drmP.h                         |   7 +
>  13 files changed, 1575 insertions(+), 210 deletions(-)
> 
> -- 
> 1.8.5.1
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list