[Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
Chris Wilson
chris at chris-wilson.co.uk
Mon Mar 5 14:35:50 UTC 2018
Quoting Chris Wilson (2018-03-05 14:34:42)
> Quoting Mika Kuoppala (2018-03-05 13:59:43)
> > Chris Wilson <chris at chris-wilson.co.uk> writes:
> >
> > > Similar to the staging around handling of engine->submit_request, we
> > > need to stop adding to the execlists->queue prior to calling
> > > engine->cancel_requests. cancel_requests will move requests from the
> > > queue onto the timeline, so if we add a request onto the queue after that
> > > point, it will be lost.
> > >
> > > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > > ---
> > > drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
> > > drivers/gpu/drm/i915/i915_request.c | 2 ++
> > > 2 files changed, 9 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index a5bd07338b46..8d913d833ab9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
> > >
> > > rq = to_request(fence);
> > > engine = rq->engine;
> > > - if (!engine->schedule)
> > > - return;
> > >
> > > - engine->schedule(rq, prio);
> > > + rcu_read_lock();
> > > + if (engine->schedule)
> > > + engine->schedule(rq, prio);
> > > + rcu_read_unlock();
> > > }
> > >
> > > static void fence_set_priority(struct dma_fence *fence, int prio)
> > > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> > > */
> > > for_each_engine(engine, i915, id) {
> > > i915_gem_reset_prepare_engine(engine);
> > > +
> > > engine->submit_request = nop_submit_request;
> > > + engine->schedule = NULL;
> >
> > Why we are not using rcu_assign_pointer and rcu_deference pair
> > in the upper part where we check the schedule?
>
> We are not using RCU protection. RCU here is being abused as a
> free-flowing stop-machine.
I'm sorely tempted to put it back to stop_machine as the races are just
plain weird and proving hard to fix :(
-Chris
More information about the Intel-gfx
mailing list