[Intel-gfx] [PATCH v3 3/3] drm/i915: Fix premature LRC unpin in GuC mode
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Jan 22 08:33:18 PST 2016
On 21/01/16 11:02, Chris Wilson wrote:
> On Thu, Jan 21, 2016 at 10:10:42AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> In GuC mode LRC pinning lifetime depends exclusively on the
>> request liftime. Since that is terminated by the seqno update
>> that opens up a race condition between GPU finishing writing
>> out the context image and the driver unpinning the LRC.
>>
>> To extend the LRC lifetime we will employ a similar approach
>> to what legacy ringbuffer submission does.
>>
>> We will start tracking the last submitted context per engine
>> and keep it pinned until it is replaced by another one.
>>
>> Note that the driver unload path is a bit fragile and could
>> benefit greatly from efforts to unify the legacy and exec
>> list submission code paths.
>>
>> At the moment i915_gem_context_fini has special casing for the
>> two which are potentialy not needed, and also depends on
>> i915_gem_cleanup_ringbuffer running before itself.
>>
>> v2:
>> * Move pinning into engine->emit_request and actually fix
>> the reference/unreference logic. (Chris Wilson)
>>
>> * ring->dev can be NULL on driver unload so use a different
>> route towards it.
>>
>> v3:
>> * Rebase.
>> * Handle the reset path. (Chris Wilson)
>> * Exclude default context from the pinning - it is impossible
>> to get it right before default context special casing in
>> general is eliminated.
>
> Since you have the refactoring in place, eliminating the special case is
> trivial. So patch 2.5?
>
> Looks like http://patchwork.freedesktop.org/patch/69972/ but you already
> have half of that patch in place.
>
> Also if we take a step to add patch 2.1:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index c25083c78ba7..d43c886fd95a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -321,6 +321,14 @@ err_destroy:
> return ERR_PTR(ret);
> }
>
> +static void i915_gem_context_unpin(struct intel_context *ctx,
> + struct intel_engine_cs *rcs)
> +{
> + if (rcs->id == RCS && ctx->legacy_hw_ctx.rcs_state)
> + i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
> + i915_gem_context_unreference(ctx);
> +}
> +
> void i915_gem_context_reset(struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -338,13 +346,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>
> for (i = 0; i < I915_NUM_RINGS; i++) {
> struct intel_engine_cs *ring = &dev_priv->ring[i];
> - struct intel_context *lctx = ring->last_context;
>
> - if (lctx) {
> - if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> - i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state);
> -
> - i915_gem_context_unreference(lctx);
> + if (ring->last_context) {
> + i915_gem_context_unpin(ring->last_context, ring);
> ring->last_context = NULL;
> }
>
> @@ -424,25 +428,18 @@ void i915_gem_context_fini(struct drm_device *dev)
> * to offset the do_switch part, so that i915_gem_context_unreference()
> * can then free the base object correctly. */
> WARN_ON(!dev_priv->ring[RCS].last_context);
> - if (dev_priv->ring[RCS].last_context == dctx) {
> - /* Fake switch to NULL context */
> - WARN_ON(dctx->legacy_hw_ctx.rcs_state->active);
> - i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
> - i915_gem_context_unreference(dctx);
> - dev_priv->ring[RCS].last_context = NULL;
> - }
> -
> i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
> }
>
> for (i = 0; i < I915_NUM_RINGS; i++) {
> struct intel_engine_cs *ring = &dev_priv->ring[i];
>
> - if (ring->last_context)
> - i915_gem_context_unreference(ring->last_context);
> + if (ring->last_context) {
> + i915_gem_context_unpin(ring->last_context, ring);
> + ring->last_context = NULL;
> + }
>
> ring->default_context = NULL;
> - ring->last_context = NULL;
> }
>
> i915_gem_context_unreference(dctx);
>
> Then we just end up with
>
> if (i915.enable_execlists)
> intel_lr_context_unpin(ring->last_context, ring);
> else
> i915_gem_context_unpin(ring->last_context, ring);
>
> which is much less messy in this patch and one step closer to unifying
> engine->pin_context()/engine->unpin_context() vfuns.
>
> The only tricky part there is having to disentangle the legacy paths,
> but doing so should allow us to see clearly that the sequence is just
>
> intel_gpu_reset();
> for_each_engine() unpin(engine->last_context);
> unpin(dev_priv->default_context);
>
> After a few more passes, we should be able to call the reset and cleanup
> functions higher up (as part of the stopping the GPU upon suspend
> procedure before tearing down the sw structs) at which point we only
> need the unpin(default_context) here.
I was confused by the current code which does the reset after unpinning:
...
i915_gem_reset(dev);
simulated = dev_priv->gpu_error.stop_rings != 0;
ret = intel_gpu_reset(dev);
...
What is right then?
Sounds bad to be unpinning with the GPU in unknown state. But obviously
it has been like this for who knows how long. So I have no idea. :(
Asking since GuC is at the moment struggling with reset, but then again
that might be something completely else...
Regards,
Tvrtko
More information about the Intel-gfx
mailing list