[Intel-gfx] [PATCH] drm/i915/selftest: If reconfigure_sseu is busy, try again

Tue Nov 26 17:19:14 UTC 2019

Quoting Tvrtko Ursulin (2019-11-26 17:11:04)
> 
> On 26/11/2019 17:08, Tvrtko Ursulin wrote:
> > 
> > On 26/11/2019 17:05, Chris Wilson wrote:
> >> Quoting Tvrtko Ursulin (2019-11-26 17:00:53)
> >>>
> >>> On 26/11/2019 16:47, Chris Wilson wrote:
> >>>> Following 58b4c1a07ada ("drm/i915: Reduce nested 
> >>>> prepare_remote_context()
> >>>> to a trylock"), we punt to the caller if the local intel_context
> >>>> happens to be busy as we try to rewrite the sseu (due to retiring in
> >>>> another thread). As the interlude should be short, spin until the lock
> >>>> is available.
> >>>>
> >>>> The regret for using mutex_trylock() and not an atomic insertion of the
> >>>> barrier is growing...
> >>>>
> >>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
> >>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> index 2ea4790f3721..571cc996577c 100644
> >>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
> >>>>        if (ret)
> >>>>                goto out_pm;
> >>>> -     ret = intel_context_reconfigure_sseu(ce, sseu);
> >>>> +     do {
> >>>> +             ret = intel_context_reconfigure_sseu(ce, sseu);
> >>>> +             cond_resched();
> >>>> +     } while (ret == -EAGAIN);
> >>>>        if (ret)
> >>>>                goto out_spin;
> >>>>
> >>>
> >>> Hm I looked at the selftests, saw error is correctly propagated, and
> >>> concluded it will be fine. I missed the problem selftests will not
> >>> actually retry. But wait, can we even count that userspace will if all
> >>> of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels
> >>> like we may need to revert.
> >>
> >> We invoke the principle of drmIoctl() catches -EAGAIN.
> > 
> > I'm not comfortable with that. :( Not least how we are saying not to use 
> > libdrm. man 2 ioctl does not mention -EAGAIN. :(
> 
> Or duct tape by looping in set_sseu as well?

Could do, the actual atomic insertion doesn't look to horrendous (albeit
yet another atomic op along that path), but will take a fair amount of
rationalising. Certainly another cup of tea worth.

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index dca15ace88f6..181c6a3aaafa 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -753,11 +753,13 @@ __i915_active_fence_set(struct i915_active_fence *active,
        struct dma_fence *prev;
        unsigned long flags;

-       /* NB: must be serialised by an outer timeline mutex (active->lock) */
+       if (fence == rcu_access_pointer(active->fence))
+               return fence;
+
        spin_lock_irqsave(fence->lock, flags);
        GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));

-       prev = rcu_dereference_protected(active->fence, active_is_held(active));
+       prev = xchg((struct dma_fence ** __force)&active->fence, fence);
        if (prev) {
                GEM_BUG_ON(prev == fence);
                spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
@@ -775,12 +777,13 @@ __i915_active_fence_set(struct i915_active_fence *active,
                 * our list_del() [decoupling prev from the callback] or
                 * the callback...
                 */
-               prev = rcu_access_pointer(active->fence);
+               if (!rcu_access_pointer(active->fence)) {
+                       RCU_INIT_POINTER(active->fence, fence);
+                       prev = NULL;
+               }
        }

-       rcu_assign_pointer(active->fence, fence);
        list_add_tail(&active->cb.node, &fence->cb_list);
-
        spin_unlock_irqrestore(fence->lock, flags);

        return prev;

-Chris