[Intel-gfx] [PATCH] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch

Mon Nov 18 17:02:56 UTC 2019

Quoting Chris Wilson (2019-11-18 16:23:42)
> In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
> the backend"), I erroneously concluded that we last modify the engine
> inside __i915_request_commit() meaning that we could enable concurrent
> submission for userspace as we enqueued this request. However, this
> falls into a trap with other users of the engine->kernel_context waking
> up and submitting their request before the idle-switch is queued, with
> the result that the kernel_context is executed out-of-sequence most
> likely upsetting the GPU and certainly ourselves when we try to retire
> the out-of-sequence requests.
> 
> As such we need to hold onto the effective engine->kernel_context mutex
> lock (via the engine pm mutex proxy) until we have finish queuing the
> request to the engine.
> 
> Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 3c0f490ff2c7..2d2a21752ae4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -116,11 +116,12 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
>         rq->sched.attr.priority = I915_PRIORITY_BARRIER;
>         __i915_request_commit(rq);
>  
> -       /* Release our exclusive hold on the engine */
> -       __intel_wakeref_defer_park(&engine->wakeref);
>         __i915_request_queue(rq, NULL);
>  
> +       /* Release our exclusive hold on the engine */
> +       __intel_wakeref_defer_park(&engine->wakeref);
>         result = false;

Gah, now I remember why I put it before:

if there is a concurrent retire requests, it may now see the request
completion prior to us marking the engine as awake => counter underflow.

Watch this space for more tricks :(
-Chris