[Intel-gfx] [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
Chris Wilson
chris at chris-wilson.co.uk
Fri Aug 5 19:30:42 UTC 2016
On Fri, Aug 05, 2016 at 09:08:34PM +0200, Daniel Vetter wrote:
> On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> > By applying the same logic as for wait-ioctl, we can query whether a
> > request has completed without holding struct_mutex. The biggest impact
> > system-wide is removing the flush_active and the contention that causes.
> >
> > Testcase: igt/gem_busy
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel at intel.com>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> > 1 file changed, 101 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ceb00970b2da..b99d64bfb7eb 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> > i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> > }
> >
> > +static __always_inline unsigned __busy_read_flag(unsigned int id)
> > +{
> > + /* Note that we could alias engines in the execbuf API, but
> > + * that would be very unwise as it prevents userspace from
> > + * fine control over engine selection. Ahem.
> > + *
> > + * This should be something like EXEC_MAX_ENGINE instead of
> > + * I915_NUM_ENGINES.
> > + */
> > + BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> > + return 0x10000 << id;
> > +}
> > +
> > +static __always_inline unsigned int __busy_write_id(unsigned int id)
> > +{
> > + return id;
> > +}
> > +
> > +static __always_inline unsigned
> > +__busy_set_if_active(const struct i915_gem_active *active,
> > + unsigned int (*flag)(unsigned int id))
> > +{
> > + /* For more discussion about the barriers and locking concerns,
> > + * see __i915_gem_active_get_rcu().
> > + */
> > + do {
> > + struct drm_i915_gem_request *request;
> > + unsigned int id;
> > +
> > + request = rcu_dereference(active->request);
> > + if (!request || i915_gem_request_completed(request))
> > + return 0;
> > +
> > + id = request->engine->exec_id;
> > +
> > + /* Check that the pointer wasn't reassigned and overwritten. */
>
> cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
> anywhere here, hence this needs an smp_rmb().
I toyed with smp_rmb().
The rcu_deference() followed by rcu_access_pointer() is ordered.
So I was back with dancing around "where the dependent-reads ordered by
the first rcu_deference ordered in front of the second access which was
itself ordered after the first?" I probably should
have stuck in the smp_rmb() and stopped worrying - it is still going to
be cheaper than the refcount traffic.
> Also nitpick: The two
> rcu_dereference(actove->request) feel a bit silly. If we move the first in
> front of the loop, and update the local request pointer (using a tmp) it
> would look tidier, and we could even move the loop termination condition
> into the while () check (and move the return flag(id) at the end of the
> function).
I was quite content with only having to think of one phase through the
loop and not worry about state being carried forward.
__busy_set_if_active(const struct i915_gem_active *active,
unsigned int (*flag)(unsigned int id))
{
+ struct drm_i915_gem_request *request;
+ unsigned int id;
+
/* For more discussion about the barriers and locking concerns,
* see __i915_gem_active_get_rcu().
*/
+ request = rcu_dereference(active->request);
do {
- struct drm_i915_gem_request *request;
- unsigned int id;
+ struct drm_i915_gem_request *tmp;
- request = rcu_dereference(active->request);
if (!request || i915_gem_request_completed(request))
return 0;
id = request->engine->exec_id;
/* Check that the pointer wasn't reassigned and overwritten. */
- if (request == rcu_access_pointer(active->request))
- return flag(id);
+ tmp = rcu_dereference(active->request);
+ if (tmp == request)
+ break;
+
+ request = tmp;
} while (1);
+
+ return flag(id);
}
is also not as well optimised by gcc, apparently.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list