[Intel-gfx] [PATCH] drm/i915: Use rcu instead of stop_machine

Thu Oct 5 16:24:45 UTC 2017

On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote:
> 
> On 05/10/2017 15:09, Daniel Vetter wrote:
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch here is just a suggestion for how to fix it up, possible
> > changes needed to make it actually work:
> > 
> > - Set the nop_submit_request first for _all_ engines, before
> >    proceeding.
> > 
> > - Make sure engine->cancel_requests copes with the possibility that
> >    not all tests have consistently used the new or old version. I dont
> >    think this is a problem, since the same can happen really with the
> >    stop_machine() locking - stop_machine also doesn't give you any kind
> >    of global ordering against other cpu threads, it just makes them
> >    stop.
> > 
> > This patch tries to address the locking snafu from
> > 
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >      drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path. But these callbacks use irqsave
> > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > 
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
> >   drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> >   drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> >   3 files changed, 9 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ab8c6946fea4..0b260e576b4b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> >   static void engine_set_wedged(struct intel_engine_cs *engine)
> >   {
> > +	engine->submit_request = nop_submit_request;
> 
> Should this be rcu_assign_pointer?

Those provide additional barriers, needed when you change/allocate the
stuff you're pointing to. We point to immutable functions, so shouldn't be
necessary (and would be confusing imo).

> > +
> >   	/* We need to be sure that no thread is running the old callback as
> >   	 * we install the nop handler (otherwise we would submit a request
> > -	 * to hardware that will never complete). In order to prevent this
> > -	 * race, we wait until the machine is idle before making the swap
> > -	 * (using stop_machine()).
> > +	 * to hardware that will never complete).
> >   	 */
> > -	engine->submit_request = nop_submit_request;
> > +	synchronize_rcu();
> 
> Consumers of this are running in irq disabled or softirq. Does this mean we
> would need synchronize_rcu_bh? Would either guarantee all tasklets and irq
> handlers have exited?

Oh ... tbh I didn't even digg that deep (much less ran this stuff). This
really is an RFC so people with real clue could say whether it has a
chance of working or not.

Looking at rcu docs we don't want _bh variants, since rcu_read_lock should
be safe in even hardirq context. _bh and _sched otoh require that all
critical sections are either in bottom halfs or hardirq context, since
they treat scheduling of those as a grace period.

Cheers, Daniel

> >   	/* Mark all executing requests as skipped */
> >   	engine->cancel_requests(engine);
> > @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
> >   				       intel_engine_last_submit(engine));
> >   }
> > -static int __i915_gem_set_wedged_BKL(void *data)
> > +void i915_gem_set_wedged(struct drm_i915_private *i915)
> >   {
> > -	struct drm_i915_private *i915 = data;
> >   	struct intel_engine_cs *engine;
> >   	enum intel_engine_id id;
> > @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
> >   	set_bit(I915_WEDGED, &i915->gpu_error.flags);
> >   	wake_up_all(&i915->gpu_error.reset_queue);
> > -
> > -	return 0;
> > -}
> > -
> > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> > -{
> > -	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
> >   }
> >   bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index b100b38f1dd2..ef78a85cb845 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >   	switch (state) {
> >   	case FENCE_COMPLETE:
> >   		trace_i915_gem_request_submit(request);
> > +		rcu_read_lock();
> >   		request->engine->submit_request(request);
> > +		rcu_read_unlock();
> 
> And _bh for these? Although this already runs with preemption off, but I
> guess it is good for documentation.
> 
> Regards,
> 
> Tvrtko
> 
> >   		break;
> >   	case FENCE_FREE:
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > index 78b9f811707f..a999161e8db1 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
> >   	}
> >   	i915_gem_request_get(vip);
> >   	i915_add_request(vip);
> > +	rcu_read_lock();
> >   	request->engine->submit_request(request);
> > +	rcu_read_unlock();
> >   	mutex_unlock(&i915->drm.struct_mutex);
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch