[Intel-gfx] [PATCH 4/7] drm/i915/gem: Don't leak non-persistent requests on changing engines

Tue Feb 11 14:15:57 UTC 2020

Quoting Tvrtko Ursulin (2020-02-11 13:41:22)
> 
> On 10/02/2020 20:57, Chris Wilson wrote:
> > +static void kill_context(struct i915_gem_context *ctx)
> > +{
> > +     if (!list_empty(&ctx->stale.engines))
> > +             kill_stale_engines(ctx);
> 
> Lets see.. set_engines can freely race with context_close. The former is 
> adding entries to the list under the lock, the latter is here inspecting 
> list state unlocked. But then proceeds to lock it and all is good if 
> false negative are not an issue. But looks like it could happen and then 
> it fails to clean up. All that is needed is for this thread to not see 
> the addition to the list. And since this is not a hot path how about you 
> just always call kill_state_engines?

Hmm. I didn't consider the race between close context and set-engines.

We would also need to reject the late addition of engines to a closed
context under the spinlock.

Ta.

> >   #endif /* __I915_GEM_CONTEXT_TYPES_H__ */
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> > index 51ba97daf2a0..bc6d4f8b78f0 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > @@ -211,10 +211,23 @@ void i915_sw_fence_complete(struct i915_sw_fence *fence)
> >       __i915_sw_fence_complete(fence, NULL);
> >   }
> >   
> > -void i915_sw_fence_await(struct i915_sw_fence *fence)
> > +bool i915_sw_fence_await(struct i915_sw_fence *fence)
> >   {
> > -     debug_fence_assert(fence);
> > -     WARN_ON(atomic_inc_return(&fence->pending) <= 1);
> > +     int old, new;
> > +
> > +     /*
> > +      * It is only safe to add a new await to the fence while it has
> > +      * not yet been signaled.
> > +      */
> > +     new = atomic_read(&fence->pending);
> > +     do {
> > +             if (new < 1)
> > +                     return false;
> > +
> > +             old = new++;
> > +     } while ((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
> 
> Simplify with atomic_try_cmpxchg?

I was under the mistaken impression we didn't have atomic_try_cmpxchg.

> I need a refresher on sw_fence->pending. (See your new comments and 
> raise you lack of old! ;)
> 
> -1 = completed
> 0 = ??

-1 = completed (all listeners awoken)
0 = signaled
1+ = pending waits (and yes we always start with 1 pending wait on behalf
of the caller)

> 1 = new, not waited upon
> 2 = waited upon

Maybe we don't really need -1? That was originally to avoid passing
FENCE_COMPLETE, FENCE_FREE but since we have the state, we don't need to
encode it? That would lead to a few small simplifications.
-Chris