[Intel-gfx] [PATCH 02/20] drm/i915: Delay queuing hangcheck to wait-request
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Jul 1 15:32:16 UTC 2016
On 01/07/16 12:22, Chris Wilson wrote:
> We can forgo queuing the hangcheck from the start of every request to
> until we wait upon a request. This reduces the overhead of every
> request, but may increase the latency of detecting a hang. Howeever, if
> nothing every waits upon a hang, did it ever hang? It also improves the
> robustness of the wait-request by ensuring that the hangchecker is
> indeed running before we sleep indefinitely (and thereby ensuring that
> we never actually sleep forever waiting for a dead GPU).
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 9 +++++----
> drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1d9878258103..34f724cc40b8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> break;
> }
>
> + /* Ensure that even if the GPU hangs, we get woken up. */
> + i915_queue_hangcheck(dev_priv);
> +
> timer.function = NULL;
> if (timeout || missed_irq(dev_priv, engine)) {
> unsigned long expire;
> @@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
> /* Not allowed to fail! */
> WARN(ret, "emit|add_request failed: %d!\n", ret);
>
> - i915_queue_hangcheck(engine->i915);
> -
> queue_delayed_work(dev_priv->wq,
> &dev_priv->mm.retire_work,
> round_jiffies_up_relative(HZ));
> @@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
>
> if (idle)
> mod_delayed_work(dev_priv->wq,
> - &dev_priv->mm.idle_work,
> - msecs_to_jiffies(100));
> + &dev_priv->mm.idle_work,
> + msecs_to_jiffies(100));
>
> return idle;
> }
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 4378a659d962..5614582ca240 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
> for_each_engine_id(engine, dev_priv, id) {
> + bool busy = waitqueue_active(&engine->irq_queue);
> u64 acthd;
> u32 seqno;
> unsigned user_interrupts;
> - bool busy = true;
>
> semaphore_clear_deadlocks(dev_priv);
>
> @@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> if (engine->hangcheck.seqno == seqno) {
> if (ring_idle(engine, seqno)) {
> engine->hangcheck.action = HANGCHECK_IDLE;
> - if (waitqueue_active(&engine->irq_queue)) {, the
> + if (busy) {
> /* Safeguard against driver failure */
> user_interrupts = kick_waiters(engine);
> engine->hangcheck.score += BUSY;
> - } else
> - busy = false;
> + }
> } else {
> /* We always increment the hangcheck score
> * if the ring is busy and still processing
> @@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> goto out;
> }
>
> + /* Reset timer in case GPU hangs without another request being added */
> if (busy_count)
> - /* Reset timer case chip hangs without another request
> - * being added */
> i915_queue_hangcheck(dev_priv);
>
> out:
>
I thought I see a problem here but I was just confused. I think it is
OK. Just won't re-queue the hangcheck if no one is waiting and no new
requests get submitted. It is unlikely that would cause a problem in
practice. It sounds very unlucky that the last submitted request ever
hangs. Balance with the benefit of not running while GPU is processing
stuff I think we can give it a go.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list