[Intel-gfx] [PATCH] drm/i915: Use rcu instead of stop_machine
Thomas Gleixner
tglx at linutronix.de
Fri Oct 6 10:01:57 UTC 2017
On Fri, 6 Oct 2017, Daniel Vetter wrote:
> On Fri, Oct 6, 2017 at 10:42 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >> > > This patch tries to address the locking snafu from
> >> >
> >> > There's a locking snafu in the code?
The lock problem exists definitely. We did not introduce new dependencies
through the hotplug locking rework. They've been there forever.
What's new is that lockdep became smarter and we exposed the hotplug lock
to lockdep fully. So yes, we need to address them no matter what.
While we might have a solution for that devfs issue, there are other ways
to expose this kind of problem when you have code pathes which come from user
space and end up taking the hotplug lock.
> >> > > Chris said parts of the reasons for going with stop_machine() was that
> >> > > it's no overhead for the fast-path.
> >> >
> >> > More than that, you don't even have to think about it. It's a one off
> >> > event that changes execution paths. I actually never thought about
> >> > putting the lock mechanism around the caller (that does prevent the issue
> >> > I was dreading of being inside the callback as it changed), it is still
> >> > magic that has nothing to do with the code flow. What variable should we
> >> > document as being rcu protected, (*engine->submit_request)()?
stop_machine() is one of the biggest hammers we have in the kernel. We try
to avoid it where ever we can. It's the least resort for problems which
cannot be solved otherwise.
> >> Well, it sounds like tglx would very much prefer we clean this locking
> >> inversion up on our side and not get them to break the cycle. Not what I
> >> wanted Thomas to look at, but since I botched it and attached the wrong
> >> splat we got the answer for this patch here, and it seems to be "don't do
> >> that".
> >
> > The analogy is with kernel live-patching (which is what we are doing
> > here). Should every module in the kernel manually markup itself to
> > support the esoteric feature, or should the feature support the wider
> > kernel?
>
> This is completely different from kernel live patching, where the
> actual instructions get changed while we execute them. You'd need to
> annotate every single IP load, which is impossible. This here seems to
> be much more a bog standard critical section, restrict to a very small
> piece of code (the hw submit functions essentially), and if we can't
> get that to work then our locking is seriously in bad shape.
> stop_machine really isn't a valid locking mechanism, except in very
> extreme cases where there's really no other solution.
Correct. This has absolutely nothing to do with life patching. It's a bug
standard serialization issue which does not require the big hammer of
stop_machine().
Thanks,
tglx
More information about the Intel-gfx
mailing list