[Intel-gfx] [PATCH] drm/i915: Use rcu instead of stop_machine

Daniel Vetter daniel at ffwll.ch
Thu Oct 5 16:12:35 UTC 2017


On Thu, Oct 05, 2017 at 03:30:12PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-05 15:09:48)
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch here is just a suggestion for how to fix it up, possible
> > changes needed to make it actually work:
> > 
> > - Set the nop_submit_request first for _all_ engines, before
> >   proceeding.
> > 
> > - Make sure engine->cancel_requests copes with the possibility that
> >   not all tests have consistently used the new or old version. I dont
> >   think this is a problem, since the same can happen really with the
> >   stop_machine() locking - stop_machine also doesn't give you any kind
> >   of global ordering against other cpu threads, it just makes them
> >   stop.
> > 
> > This patch tries to address the locking snafu from
> 
> There's a locking snafu in the code?
>  
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path.
> 
> More than that, you don't even have to think about it. It's a one off
> event that changes execution paths. I actually never thought about
> putting the lock mechanism around the caller (that does prevent the issue
> I was dreading of being inside the callback as it changed), it is still
> magic that has nothing to do with the code flow. What variable should we
> document as being rcu protected, (*engine->submit_request)()?

Yeah, engine->submit_request would be the one, except it shouldn't use
rcu_assign_pointer/rcu_dereference since we don't need those barriers,
ever - the .text doesn't change after all. That's why I didn't splatter
the sparse annotations all over it. In a way we don't need the data
barriers of rcu, but really only the "everyone has passed the critical
section now" part of it.

> I'm definitely not sold on having set-wedge dictate terms to the rest of
> the code.

Well, it sounds like tglx would very much prefer we clean this locking
inversion up on our side and not get them to break the cycle. Not what I
wanted Thomas to look at, but since I botched it and attached the wrong
splat we got the answer for this patch here, and it seems to be "don't do
that".
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list