[PATCH 6/7] drm/i915/pmu: Lazy unregister

Wed Jul 24 15:39:57 UTC 2024

On Wed, Jul 24, 2024 at 02:41:05PM GMT, Peter Zijlstra wrote:
>On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
>> On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>> >
>> > On 22/07/2024 22:06, Lucas De Marchi wrote:
>> > > Instead of calling perf_pmu_unregister() when unbinding, defer that to
>> > > the destruction of i915 object. Since perf itself holds a reference in
>> > > the event, this only happens when all events are gone, which guarantees
>> > > i915 is not unregistering the pmu with live events.
>> > >
>> > > Previously, running the following sequence would crash the system after
>> > > ~2 tries:
>> > >
>> > > 	1) bind device to i915
>> > > 	2) wait events to show up on sysfs
>> > > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
>> > > 	4) unbind driver
>> > > 	5) kill perf
>> > >
>> > > Most of the time this crashes in perf_pmu_disable() while accessing the
>> > > percpu pmu_disable_count. This happens because perf_pmu_unregister()
>> > > destroys it with free_percpu(pmu->pmu_disable_count).
>> > >
>> > > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>> > > after (4). The downside is that if a new bind operation is attempted for
>> > > the same device/driver without killing the perf process, i915 will fail
>> > > to register the pmu (but still load successfully). This seems better
>> > > than completely crashing the system.
>> >
>> > So effectively allows unbind to succeed without fully unbinding the
>> > driver from the device? That sounds like a significant drawback and if
>> > so, I wonder if a more complicated solution wouldn't be better after
>> > all. Or is there precedence for allowing userspace keeping their paws on
>> > unbound devices in this way?
>>
>> keeping the resources alive but "unplunged" while the hardware
>> disappeared is a common thing to do... it's the whole point of the
>> drmm-managed resource for example. If you bind the driver and then
>> unbind it while userspace is holding a ref, next time you try to bind it
>> will come up with a different card number. A similar thing that could be
>> done is to adjust the name of the event - currently we add the mangled
>> pci slot.
>>
>> That said, I agree a better approach would be to allow
>> perf_pmu_unregister() to do its job even when there are open events. On
>> top of that (or as a way to help achieve that), make perf core replace
>> the callbacks with stubs when pmu is unregistered - that would even kill
>> the need for i915's checks on pmu->closed (and fix the lack thereof in
>> other drivers).
>>
>> It can be a can of worms though and may be pushed back by perf core
>> maintainers, so it'd be good have their feedback.
>
>I don't think I understand the problem. I also don't understand drivers
>much -- so that might be the problem.

We can bind/unbind the driver to/from the pci device. Example:

	echo -n "0000:00:02.0" > /sys/bus/pci/drivers/i915/unbind

This is essentially unplugging the HW from the kernel.  This will
trigger the driver to deinitialize and free up all resources use by that
device.

So when the driver is binding it does:

	perf_pmu_register();

When it's unbinding:

	perf_pmu_unregister();

Reasons to unbind include:

	- driver testing so then we can unload the module and load it
	  again
	- device is toast - doing an FLR and rebinding may
	  fix/workaround it
	- For SR-IOV, in which we are exposing multiple instances of the
	  same underlying PCI device, we may need to bind/unbind
	  on-demand  (it's not yet clear if perf_pmu_register() would be
	  called on the VF instances, but listed here just to explain
	  the bind/unbind)
	- Hotplug

>
>So the problem appears to be that the device just disappears without
>warning? How can a GPU go away like that?
>
>Since you have a notion of this device, can't you do this stubbing you
>talk about? That is, if your internal device reference becomes NULL, let
>the PMU methods preserve the state like no-ops.

It's not clear if you are suggesting these stubs to be in each driver or
to be assigned by perf in perf_pmu_unregister(). Some downsides
of doing it in the driver:

	- you can't remove the module as perf will still call module
	  code
	- need to replicate the stubs in every driver (or the
	  equivalent of pmu->closed checks in i915_pmu.c)

I *think* the stubs would be quiet the same for every device, so could
be feasible to share them inside perf. Alternatively it could simply
shortcut the call, without stubs, by looking at event->hw.state and
a new pmu->state. I can give this a try.

thanks
Lucas De Marchi

>
>And then when the last event goes away, tear down the whole thing.
>
>Again, I'm not sure I'm following.