[PATCH 6/7] drm/i915/pmu: Lazy unregister

Mon Sep 9 21:03:07 UTC 2024

Hi Peter,

On Wed, Jul 24, 2024 at 10:39:57AM GMT, Lucas De Marchi wrote:
>On Wed, Jul 24, 2024 at 02:41:05PM GMT, Peter Zijlstra wrote:
>>On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
>>>On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>>>>
>>>> On 22/07/2024 22:06, Lucas De Marchi wrote:
>>>> > Instead of calling perf_pmu_unregister() when unbinding, defer that to
>>>> > the destruction of i915 object. Since perf itself holds a reference in
>>>> > the event, this only happens when all events are gone, which guarantees
>>>> > i915 is not unregistering the pmu with live events.
>>>> >
>>>> > Previously, running the following sequence would crash the system after
>>>> > ~2 tries:
>>>> >
>>>> > 	1) bind device to i915
>>>> > 	2) wait events to show up on sysfs
>>>> > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
>>>> > 	4) unbind driver
>>>> > 	5) kill perf
>>>> >
>>>> > Most of the time this crashes in perf_pmu_disable() while accessing the
>>>> > percpu pmu_disable_count. This happens because perf_pmu_unregister()
>>>> > destroys it with free_percpu(pmu->pmu_disable_count).
>>>> >
>>>> > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>>>> > after (4). The downside is that if a new bind operation is attempted for
>>>> > the same device/driver without killing the perf process, i915 will fail
>>>> > to register the pmu (but still load successfully). This seems better
>>>> > than completely crashing the system.
>>>>
>>>> So effectively allows unbind to succeed without fully unbinding the
>>>> driver from the device? That sounds like a significant drawback and if
>>>> so, I wonder if a more complicated solution wouldn't be better after
>>>> all. Or is there precedence for allowing userspace keeping their paws on
>>>> unbound devices in this way?
>>>
>>>keeping the resources alive but "unplunged" while the hardware
>>>disappeared is a common thing to do... it's the whole point of the
>>>drmm-managed resource for example. If you bind the driver and then
>>>unbind it while userspace is holding a ref, next time you try to bind it
>>>will come up with a different card number. A similar thing that could be
>>>done is to adjust the name of the event - currently we add the mangled
>>>pci slot.
>>>
>>>That said, I agree a better approach would be to allow
>>>perf_pmu_unregister() to do its job even when there are open events. On
>>>top of that (or as a way to help achieve that), make perf core replace
>>>the callbacks with stubs when pmu is unregistered - that would even kill
>>>the need for i915's checks on pmu->closed (and fix the lack thereof in
>>>other drivers).
>>>
>>>It can be a can of worms though and may be pushed back by perf core
>>>maintainers, so it'd be good have their feedback.
>>
>>I don't think I understand the problem. I also don't understand drivers
>>much -- so that might be the problem.
>
>We can bind/unbind the driver to/from the pci device. Example:
>
>	echo -n "0000:00:02.0" > /sys/bus/pci/drivers/i915/unbind
>
>This is essentially unplugging the HW from the kernel.  This will
>trigger the driver to deinitialize and free up all resources use by that
>device.
>
>So when the driver is binding it does:
>
>	perf_pmu_register();
>
>When it's unbinding:
>
>	perf_pmu_unregister();
>	
>Reasons to unbind include:
>
>	- driver testing so then we can unload the module and load it
>	  again
>	- device is toast - doing an FLR and rebinding may
>	  fix/workaround it
>	- For SR-IOV, in which we are exposing multiple instances of the
>	  same underlying PCI device, we may need to bind/unbind
>	  on-demand  (it's not yet clear if perf_pmu_register() would be
>	  called on the VF instances, but listed here just to explain
>	  the bind/unbind)
>	- Hotplug
>
>>
>>So the problem appears to be that the device just disappears without
>>warning? How can a GPU go away like that?
>>
>>Since you have a notion of this device, can't you do this stubbing you
>>talk about? That is, if your internal device reference becomes NULL, let
>>the PMU methods preserve the state like no-ops.
>
>It's not clear if you are suggesting these stubs to be in each driver or
>to be assigned by perf in perf_pmu_unregister(). Some downsides
>of doing it in the driver:
>
>	- you can't remove the module as perf will still call module
>	  code
>	- need to replicate the stubs in every driver (or the
>	  equivalent of pmu->closed checks in i915_pmu.c)
>
>I *think* the stubs would be quiet the same for every device, so could
>be feasible to share them inside perf. Alternatively it could simply
>shortcut the call, without stubs, by looking at event->hw.state and
>a new pmu->state. I can give this a try.

trying to revive these patches to get this fixed. Are you ok with one of
those approaches, i.e. a) either add the stub in the perf side or b)
shortcut the calls after perf_pmu_unregister() is called ?

Lucas De Marchi

>
>thanks
>Lucas De Marchi
>
>>
>>And then when the last event goes away, tear down the whole thing.
>>
>>Again, I'm not sure I'm following.