[Intel-gfx] [PATCH 1/3] drm/i915/gem: Don't leak non-persistent requests on changing engines
Chris Wilson
chris at chris-wilson.co.uk
Fri Feb 7 16:55:35 UTC 2020
Quoting Tvrtko Ursulin (2020-02-07 16:46:55)
>
> If you want quick&dirty feedback read below, if you want something
> smarter wait some more. :)
>
> On 07/02/2020 11:11, Chris Wilson wrote:
> > +static void kill_stale_engines(struct i915_gem_context *ctx)
> > +{
> > + struct i915_gem_engines *pos, *next;
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&ctx->stale.lock, flags);
> > + list_for_each_entry_safe(pos, next, &ctx->stale.engines, link) {
> > + if (!i915_sw_fence_await(&pos->fence))
> > + continue;
>
> When is this path hit?
Race with the interrupt callback.
> > +
> > + spin_unlock_irqrestore(&ctx->stale.lock, flags);
> > +
> > + kill_engines(pos);
> > +
> > + spin_lock_irqsave(&ctx->stale.lock, flags);
> > + list_safe_reset_next(pos, next, link);
> > + list_del_init(&pos->link);
> > +
> > + i915_sw_fence_complete(&pos->fence);
>
> This will trigger FENCE_FREE below?
Yes, the final completion sends both notifications.
> > +static int engines_notify(struct i915_sw_fence *fence,
> > + enum i915_sw_fence_notify state)
> > +{
> > + struct i915_gem_engines *engines =
> > + container_of(fence, typeof(*engines), fence);
> > +
> > + switch (state) {
> > + case FENCE_COMPLETE:
> > + if (!list_empty(&engines->link)) {
>
> Why it is safe to look at the state of engines->link outside the lock?
> We can have a race between context close and completion event on a stale
> engine, right?
There is no race :)
It's just coordination with kill_stale_engines().
> > +static void engines_idle_release(struct i915_gem_engines *engines)
> > +{
> > + struct i915_gem_engines_iter it;
> > + struct intel_context *ce;
> > + unsigned long flags;
> > +
> > + GEM_BUG_ON(!engines);
> > + i915_sw_fence_init(&engines->fence, engines_notify);
> > +
> > + spin_lock_irqsave(&engines->ctx->stale.lock, flags);
> > + list_add(&engines->link, &engines->ctx->stale.engines);
> > + spin_unlock_irqrestore(&engines->ctx->stale.lock, flags);
> > +
> > + for_each_gem_engine(ce, engines, it) {
> > + struct dma_fence *fence;
> > + int err;
> > +
> > + if (!ce->timeline)
> > + continue;
>
> When does this happen?
Replacing the default engines before use. Or any engine set prior to
use.
> > +
> > + fence = i915_active_fence_get(&ce->timeline->last_request);
> > + if (!fence)
> > + continue;
> > +
> > + err = i915_sw_fence_await_dma_fence(&engines->fence,
> > + fence, 0,
> > + GFP_KERNEL);
> > +
> > + dma_fence_put(fence);
> > + if (err < 0) {
> > + kill_engines(engines);
> > + break;
>
> Okay to leave already setup awaits active in this case?
Yes. They will be signaled. It may seem a bit harsh, but we fell into an
unlikely error path and have to do so something.
> > -void i915_sw_fence_await(struct i915_sw_fence *fence)
> > +bool i915_sw_fence_await(struct i915_sw_fence *fence)
> > {
> > - debug_fence_assert(fence);
> > - WARN_ON(atomic_inc_return(&fence->pending) <= 1);
> > + int old, new;
> > +
> > + new = atomic_read(&fence->pending);
> > + do {
> > + if (new < 1)
> > + return false;
> > +
> > + old = new++;
> > + } while ((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
> > +
> > + return true;
>
> No idea what's happening here. Why was the existing code inadequate and
> what are you changing?
I needed an await_if_busy to handle the race with the interrupts.
-Chris
More information about the Intel-gfx
mailing list