[Intel-gfx] [PATCH] drm/i915/gem: Consolidate ctx->engines[] release

Mon Mar 2 14:20:33 UTC 2020

On 28/02/2020 12:19, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-02-28 12:08:18)
>>
>> On 27/02/2020 11:01, Chris Wilson wrote:
>>> +static void engines_idle_release(struct i915_gem_context *ctx,
>>> +                              struct i915_gem_engines *engines)
>>> +{
>>> +     struct i915_gem_engines_iter it;
>>> +     struct intel_context *ce;
>>> +
>>> +     i915_sw_fence_init(&engines->fence, engines_notify);
>>> +     INIT_LIST_HEAD(&engines->link);
>>> +
>>> +     engines->ctx = i915_gem_context_get(ctx);
>>> +
>>> +     for_each_gem_engine(ce, engines, it) {
>>> +             struct dma_fence *fence;
>>> +             int err = 0;
>>> +
>>> +             /* serialises with execbuf */
>>> +             RCU_INIT_POINTER(ce->gem_context, NULL);
>>
>> What is the purpose of this? Looks dodgy - like it can't really
>> guarantee much.
> 
> It should be fine if we do:
> 
> execbuf:				context_close:
> 					ce->gem_context = NULL;
> add_to_timeline();			get(&ce->timeline->last_request);
> if (!ce->gem_context)			
> 	return -ENOENT;
> 
> If add is before the get, we will wait on it.
> If add is after the get, we will wait on the earlier request, and skip
> this one -- it will not execute.

What guarantees we see the latest last_request here, or that execbuf 
sees the NULL before we try the get? The code elsewhere isn't assuming 
unstable ce->gem_context I think.. engines we can change, but I don't 
remember we accounted for this.

>>> +             if (!intel_context_pin_if_active(ce))
>>> +                     continue;
>>> +
>>> +             fence = i915_active_fence_get(&ce->timeline->last_request);
>>> +             if (fence) {
>>> +                     err = i915_sw_fence_await_dma_fence(&engines->fence,
>>> +                                                         fence, 0,
>>> +                                                         GFP_KERNEL);
>>> +                     dma_fence_put(fence);
>>> +             }
>>> +             intel_context_unpin(ce);
>>> +             if (err < 0)
>>> +                     goto kill;
>>> +     }
>>> +
>>> +     spin_lock_irq(&ctx->stale.lock);
>>> +     if (!i915_gem_context_is_closed(ctx))
>>> +             list_add_tail(&engines->link, &ctx->stale.engines);
>>> +     spin_unlock_irq(&ctx->stale.lock);
>>> +
>>> +kill:
>>> +     if (list_empty(&engines->link)) /* raced, already closed */
>>> +             kill_engines(engines);
>>
>> Raced with.. ? context_close? Can't be the fence yet, before it has been
>> committed.
> 
> Yes, there's still the set_engines vs context_close to worry about.

I'd just say "raced with context_close" then.

Regards,

Tvrtko