[Intel-gfx] [PATCH 1/2] drm/i915: Convert hangcheck from a timer into a delayed work item
Chris Wilson
chris at chris-wilson.co.uk
Fri Jan 23 12:58:35 PST 2015
On Fri, Jan 23, 2015 at 02:44:07PM +0200, Mika Kuoppala wrote:
> From: Chris Wilson <chris at chris-wilson.co.uk>
>
> When run as a timer, i915_hangcheck_elapsed() must adhere to all the
> rules of running in a softirq context. This is advantageous to us as we
> want to minimise the risk that a driver bug will prevent us from
> detecting a hung GPU. However, that is irrelevant if the driver bug
> prevents us from resetting and recovering. Still it is prudent not to
> rely on mutexes inside the checker, but given the coarseness of
> dev->struct_mutex doing so is extremely hard.
>
> Give in and run from a work queue, i.e. outside of softirq.
>
> v2:
>
> The conversion does have one significant change, from the use of
> mod_timer to schedule_delayed_work, means that the time that we execute
> the first hangcheck is fixed and not continually deferred by later work.
> This has the advantage of not allowing userspace to fill the ring before
> hangcheck can finally run. At the same time, it removes the ability for
> the interrupt to defer the hangcheck as well. This is sensible for that
> an interrupt is only for a single engine, whereas we perform hangcheck
> globally, so whilst one ring may have hung, the other could be running
> normally and preventing the hangcheck from firing.
We can drop this comment since we have already applied this change in an
earlier patch.
> @@ -3097,17 +3099,11 @@ static void i915_hangcheck_elapsed(unsigned long data)
>
> void i915_queue_hangcheck(struct drm_device *dev)
> {
> - struct drm_i915_private *dev_priv = dev->dev_private;
> - struct timer_list *timer = &dev_priv->gpu_error.hangcheck_timer;
> -
> if (!i915.enable_hangcheck)
> return;
>
> - /* Don't continually defer the hangcheck, but make sure it is active */
Keep the comment, or more preferrably let's capture the reason why we
don't want to continually defer the hangcheck:
/* Don't continually defer the hangcheck so that it is always run at
* least once after work has been scheduled on any ring. Otherwise,
* we will ignore a hung ring if a second ring is kept busy.
*/
> - if (timer_pending(timer))
> - return;
> - mod_timer(timer,
> - round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
> + schedule_delayed_work(&to_i915(dev)->gpu_error.hangcheck_work,
> + round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
> }
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list