[Intel-gfx] [PATCH 1/4] drm/i915/execlists: Kick start request processing after a reset

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 18 09:10:34 UTC 2017


Quoting Michał Winiarski (2017-09-18 09:53:50)
> On Sat, Sep 16, 2017 at 09:44:11PM +0100, Chris Wilson wrote:
> > During a reset, we may skip over completed requests and lost
> > context-switch interrupts. Following the reset, we may then may end up
> > with no active requests in the ELSP (and so do not resubmit to restart
> > the engine), but have a queue of requests ready for execution. This is
> > unlikely, it requires the last request to complete after the hang is
> > detected, but not impossible. The outcome of this is that the engine
> > stalls, possibly leading to full ring and indefinite wait under
> > struct_mutex, eventually leading to a full driver hang.
> > 
> > Alternatively, we can solve this by unsubmitting the incomplete requests
> > and just kickstarting the tasklet. Michał has patches for that, which I
> > initially disliked due to the extra complexity, but the complexity of
> > this "simple" restart is growing...
> 
> You are doing exactly that in 4/4.
> Perhaps squash the two together to avoid moving code around, although this one
> is a genuine fix, so I guess it's also fine on its own.

It was a fix that introduced the concept of calling tasklet_schedule
during restart, which is then expanded on by 4/4 to do everything. I
liked the progression.

> If you rebase the whole thing on top of coalesced GuC requests (which now is all
> reviewed and ready to be merged), we'll have uniform reset handling for GuC
> and execlists.

Bugfix wins :-p

Are you happy if I pull in the coalesced guc requests with this
amendment:

@@ -1181,7 +1182,7 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
                 */
                engine->irq_tasklet.func = i915_guc_irq_handler;
                clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-               i915_guc_submit(engine);
+               tasklet_schedule(&engine->irq_tasklet);
        }
 
        return 0;

with the desc->tail fix, guc has been stable for a day of mixed hang
testing.
-Chris


More information about the Intel-gfx mailing list