[Intel-gfx] [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
Jesse Barnes
jbarnes at virtuousgeek.org
Thu Jul 3 17:44:20 CEST 2014
On Thu, 3 Jul 2014 08:09:01 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Since we rely on hangcheck to wait up and kick us out of an indefinite
> wait should the GPU ever stop functioning, it appears sensible that we
> should check that hangcheck is indeed active before starting that wait.
> This just prevents a driver error in the processing of hangcheck from
> appearing to hang the machine.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/i915_gem.c | 2 ++
> drivers/gpu/drm/i915/i915_irq.c | 11 +++++++++++
> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
> 4 files changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8670d05..c326a99 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2149,6 +2149,7 @@ extern void intel_console_resume(struct work_struct *work);
>
> /* i915_irq.c */
> void i915_queue_hangcheck(struct drm_device *dev);
> +void i915_check_hangcheck(struct drm_device *dev);
> __printf(3, 4)
> void i915_handle_error(struct drm_device *dev, bool wedged,
> const char *fmt, ...);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ecb835b..120ed6d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1455,6 +1455,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
> break;
> }
>
> + i915_check_hangcheck(ring->dev);
> +
> timer.function = NULL;
> if (timeout || missed_irq(dev_priv, ring)) {
> unsigned long expire;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 9e5a295..daa590e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3063,6 +3063,17 @@ void i915_queue_hangcheck(struct drm_device *dev)
> round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
> }
>
> +void i915_check_hangcheck(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + if (!i915.enable_hangcheck)
> + return;
> +
> + if (!timer_pending(&dev_priv->gpu_error.hangcheck_timer))
> + mod_timer(&dev_priv->gpu_error.hangcheck_timer,
> + round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
> +}
> +
> static void ibx_irq_reset(struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4f3397f..6365d4d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1634,6 +1634,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
> master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
> }
>
> + i915_check_hangcheck(dev);
> +
> io_schedule_timeout(1);
>
> if (dev_priv->mm.interruptible && signal_pending(current)) {
Are there any bugs associated with this?
i915_rearm_hangcheck() or something might more accurately describe
what's going on here.
I suppose both of these paths are protected by the struct_mutex? If
not, might we race and mod_timer() this twice from two threads in
succession? I guess that's harmless...
--
Jesse Barnes, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list