[PATCH 0/5] perf: Fix pmu for drivers with bind/unbind

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Fri Oct 11 22:21:18 UTC 2024


On Tue, Oct 08, 2024 at 01:34:56PM -0500, Lucas De Marchi wrote:
>v2 of my attempt at fixing how i915 interacts with perf events.
>
>v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/
>
>From other people:
>1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/
>2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/
>
>WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More
>on this below.
>
>This series basically builds on the idea of the first patch of my
>previous series, but extends it in a way that
>
>	a) the other patches are not needed  (at least, not as is) and
>	b) driver can rebind just fine - no issues with the new call to
>	   perf_pmu_register()

I have 2 broad questions:

(1) I am curious how (b) works. You seem to have a notion of instances of 
devices and then are you using the instance number to create the name 
used for the sysfs entry? Did I get that right?

If so, should the application discover what the name is each time it is 
run? In the failure case that I am seeing, I am running an application 
that does not work when I rename the sysfs entry to something else.

(2) Similar to Patch 1 of your v1 series where you modified _free_event:

static void _free_event(struct perf_event *event)
{
	struct module *module;
...
	module = event->pmu->module;
...
	if (event->destroy)
		event->destroy(event);
...
	module_put(module);
...
}

With the above code, the kref to i915->pmu can be taken from the below 
points in i915 code (just to point out the sequence):

i915_pmu_register()
{
	perf_pmu_register()
	drm_dev_get()
	kref_init()
}

i915_pmu_unregister()
{
	kref_put(&ref, pmu_cleanup)
} 

i915_pmu_event_init()
{
	kref_get()
}

i915_pmu_event_destroy()
{
	kref_put(&ref, pmu_cleanup)
}

void pmu_cleanup(struct kref *ref)
{
	i915_pmu_unregister_cpuhp_state(pmu);
	perf_pmu_unregister(&pmu->base);
	pmu->base.event_init = NULL;
	kfree(pmu->base.attr_groups);
	if (!is_igp(i915))
		kfree(pmu->name);
	free_event_attributes(pmu);
	drm_dev_put(&i915->drm);
}

Would this work? Do you see any gaps that may need the ref counting code 
you added in perf?

Thanks,
Umesh

>
>Another difference is that rather than mixing i915 cleanups this just
>adds a dummy pmu with no backing HW. Intention for dummy_pmu is for
>experimental purpose and to demonstrate changes tha need to be applied
>to i915 (and probably amdgpu, and also in the pending xe patch).
>If desired to have an example like that in tree, then we should hide it
>behind a config option and I'd need to re-check the error handling.
>
>With this set I could run the following test script multiple times with
>no issues observed:
>
>	#!/bin/bash
>
>	set -e
>
>	rand_sleep() {
>		sleep $(bc -l  <<< "$(shuf -i 0-3000 -n 1) / 1000")
>	}
>
>	test_noperf() {
>		echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
>		echo 0 > /sys/kernel/debug/dummy_pmu/unbind
>	}
>
>	test_perf_before() {
>		echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
>		perf stat --interval-count 2 -e dummy_pmu_0/test-event-1/ -I500
>		echo 0 > /sys/kernel/debug/dummy_pmu/unbind
>	}
>
>	test_kill_perf_later() {
>		echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
>		perf stat -e dummy_pmu_0/test-event-1/ -I500 &
>		pid=$!
>		rand_sleep
>		echo 0 > /sys/kernel/debug/dummy_pmu/unbind
>		rand_sleep
>		kill $pid
>	}
>
>	test_kill_perf_laaaaaaater() {
>		echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
>		perf stat -e dummy_pmu_0/test-event-1/ -I500 &
>		pid=$!
>		rand_sleep
>		echo 0 > /sys/kernel/debug/dummy_pmu/unbind
>		rand_sleep
>		rand_sleep
>		rand_sleep
>		kill $pid
>	}
>
>	test_kill_perf_with_leader() {
>		echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
>		perf stat -e '{dummy_pmu_0/test-event-1/,dummy_pmu_0/test-event-2/}:S' -I500 &
>		pid=$!
>		rand_sleep
>		echo 0 > /sys/kernel/debug/dummy_pmu/unbind
>		rand_sleep
>		rand_sleep
>		kill $pid
>	}
>
>	N=${1:-1}
>
>	for ((i=0; i<N; i++)); do
>		printf "%04u/%04u\n" "$((i+1))" "$N" >&2
>		test_noperf
>		test_perf_before
>		test_kill_perf_later
>		test_kill_perf_laaaaaaater
>		test_kill_perf_with_leader
>		echo >&2
>	done
>
>Last patch is optional for a driver and not needed for the fix.
>
>Open topics:
>
>	- If something like the last patch is desirable, should it be
>	  done from inside perf_pmu_unregister()?
>
>	- Should we have a dummy_pmu (or whatever name) behind a config,
>	  or maybe in Documentation/ ?
>
>thanks,
>Lucas De Marchi
>
>Lucas De Marchi (5):
>  perf: Add dummy pmu module
>  perf: Move free outside of the mutex
>  perf: Add pmu get/put
>  perf/dummy_pmu: Tie pmu to device lifecycle
>  perf/dummy_pmu: Track and disable active events
>
> include/linux/perf_event.h |  12 +
> kernel/events/Makefile     |   1 +
> kernel/events/core.c       |  39 ++-
> kernel/events/dummy_pmu.c  | 492 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 539 insertions(+), 5 deletions(-)
> create mode 100644 kernel/events/dummy_pmu.c
>
>-- 
>2.46.2
>


More information about the dri-devel mailing list