[Intel-gfx] [PATCH] drm/i915: Use rcu instead of stop_machine

Thu Oct 5 14:30:12 UTC 2017

Quoting Daniel Vetter (2017-10-05 15:09:48)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch here is just a suggestion for how to fix it up, possible
> changes needed to make it actually work:
> 
> - Set the nop_submit_request first for _all_ engines, before
>   proceeding.
> 
> - Make sure engine->cancel_requests copes with the possibility that
>   not all tests have consistently used the new or old version. I dont
>   think this is a problem, since the same can happen really with the
>   stop_machine() locking - stop_machine also doesn't give you any kind
>   of global ordering against other cpu threads, it just makes them
>   stop.
> 
> This patch tries to address the locking snafu from

There's a locking snafu in the code?

> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path.

More than that, you don't even have to think about it. It's a one off
event that changes execution paths. I actually never thought about
putting the lock mechanism around the caller (that does prevent the issue
I was dreading of being inside the callback as it changed), it is still
magic that has nothing to do with the code flow. What variable should we
document as being rcu protected, (*engine->submit_request)()?

I'm definitely not sold on having set-wedge dictate terms to the rest of
the code.
-Chris