[Intel-gfx] [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl

Fri Aug 5 19:30:42 UTC 2016

On Fri, Aug 05, 2016 at 09:08:34PM +0200, Daniel Vetter wrote:
> On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> > By applying the same logic as for wait-ioctl, we can query whether a
> > request has completed without holding struct_mutex. The biggest impact
> > system-wide is removing the flush_active and the contention that causes.
> > 
> > Testcase: igt/gem_busy
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel at intel.com>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> >  1 file changed, 101 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ceb00970b2da..b99d64bfb7eb 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> >  	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> >  }
> >  
> > +static __always_inline unsigned __busy_read_flag(unsigned int id)
> > +{
> > +	/* Note that we could alias engines in the execbuf API, but
> > +	 * that would be very unwise as it prevents userspace from
> > +	 * fine control over engine selection. Ahem.
> > +	 *
> > +	 * This should be something like EXEC_MAX_ENGINE instead of
> > +	 * I915_NUM_ENGINES.
> > +	 */
> > +	BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> > +	return 0x10000 << id;
> > +}
> > +
> > +static __always_inline unsigned int __busy_write_id(unsigned int id)
> > +{
> > +	return id;
> > +}
> > +
> > +static __always_inline unsigned
> > +__busy_set_if_active(const struct i915_gem_active *active,
> > +		     unsigned int (*flag)(unsigned int id))
> > +{
> > +	/* For more discussion about the barriers and locking concerns,
> > +	 * see __i915_gem_active_get_rcu().
> > +	 */
> > +	do {
> > +		struct drm_i915_gem_request *request;
> > +		unsigned int id;
> > +
> > +		request = rcu_dereference(active->request);
> > +		if (!request || i915_gem_request_completed(request))
> > +			return 0;
> > +
> > +		id = request->engine->exec_id;
> > +
> > +		/* Check that the pointer wasn't reassigned and overwritten. */
> 
> cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
> anywhere here, hence this needs an smp_rmb().

I toyed with smp_rmb(). 

The rcu_deference() followed by rcu_access_pointer() is ordered.

So I was back with dancing around "where the dependent-reads ordered by 
the first rcu_deference ordered in front of the second access which was
itself ordered after the first?" I probably should
have stuck in the smp_rmb() and stopped worrying - it is still going to
be cheaper than the refcount traffic.

> Also nitpick: The two
> rcu_dereference(actove->request) feel a bit silly. If we move the first in
> front of the loop, and update the local request pointer (using a tmp) it
> would look tidier, and we could even move the loop termination condition
> into the while () check (and move the return flag(id) at the end of the
> function).

I was quite content with only having to think of one phase through the
loop and not worry about state being carried forward.

 __busy_set_if_active(const struct i915_gem_active *active,
                     unsigned int (*flag)(unsigned int id))
 {
+       struct drm_i915_gem_request *request;
+       unsigned int id;
+
        /* For more discussion about the barriers and locking concerns,
         * see __i915_gem_active_get_rcu().
         */
+       request = rcu_dereference(active->request);
        do {
-               struct drm_i915_gem_request *request;
-               unsigned int id;
+               struct drm_i915_gem_request *tmp;

-               request = rcu_dereference(active->request);
                if (!request || i915_gem_request_completed(request))
                        return 0;

                id = request->engine->exec_id;

                /* Check that the pointer wasn't reassigned and overwritten. */
-               if (request == rcu_access_pointer(active->request))
-                       return flag(id);
+               tmp = rcu_dereference(active->request);
+               if (tmp == request)
+                       break;
+
+               request = tmp;
        } while (1);
+
+       return flag(id);
 }

is also not as well optimised by gcc, apparently.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre