[PATCH] drm/i915/pmu: Drop custom hotplug code

Sat Jan 25 16:32:11 UTC 2025

On Fri, Jan 24, 2025 at 04:46:21PM -0800, Umesh Nerlige Ramappa wrote:
>Hi Lucas,
>
>Mostly a bunch of questions since I think I am missing something.
>
>On Tue, Jan 21, 2025 at 10:59:08AM -0600, Lucas De Marchi wrote:
>>On Tue, Jan 21, 2025 at 10:53:31AM -0500, Liang, Kan wrote:
>>>
>>>
>>>On 2025-01-21 9:29 a.m., Lucas De Marchi wrote:
>>>>On Mon, Jan 20, 2025 at 08:42:41PM -0500, Liang, Kan wrote:
>>>>>>>>-static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node
>>>>>>>>*node)
>>>>>>>>-{
>>>>>>>>-    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu),
>>>>>>>>cpuhp.node);
>>>>>>>>-    unsigned int target = i915_pmu_target_cpu;
>>>>>>>>-
>>>>>>>>-    /*
>>>>>>>>-     * Unregistering an instance generates a CPU offline event which
>>>>>>>>we must
>>>>>>>>-     * ignore to avoid incorrectly modifying the shared
>>>>>>>>i915_pmu_cpumask.
>>>>>>>>-     */
>>>>>>>>-    if (!pmu->registered)
>>>>>>>>-        return 0;
>>>>>>>>-
>>>>>>>>-    if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
>>>>>>>>-        target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
>>>>>>>>-
>>>>>>>
>>>>>>>I'm not familar with the i915 PMU, but it seems suggest a core scope
>>>>>>>PMU, not a system-wide scope.
>>>>>>
>>>>>>counter is in a complete separate device - it doesn't depend on core or
>>>>>>die or pkg - not sure why it cared about topology_sibling_cpumask here.
>>>>>
>>>>>OK. But it's still a behavior change. Please make it clear in the
>>>>>description that the patch also changes/fixes the scope from core scope
>>>>>to system-wide.
>>>>
>>>>sure... do you have a suggestion how to test the hotplug? For testing
>>>>purposes, can I force the perf cpu assigned to be something other than
>>>>the cpu0?
>>>
>>>Yes, it's a bit tricky to verify the hotplug if the assigned CPU is
>>>CPU0. I don't know a way to force another CPU without changing the code.
>>>You may have to instrument the code for the test.
>>>
>>>Another test you may want to do is the perf system-wide test, e.g., perf
>>>stat -a -e i915/actual-frequency/ sleep 1.
>>>
>>>The existing code assumes the counter is core scope. So the result
>>>should be huge, since perf will read the counter on each core and add
>>>them up.
>>
>>that is not allowed and it simply fails to init the counter:
>>
>>static int i915_pmu_event_init(struct perf_event *event)
>>	...
>>	if (event->cpu < 0)
>>		return -EINVAL;
>>	if (!cpumask_test_cpu(event->cpu, &i915_pmu_cpumask))
>>		return -EINVAL;
>>	...
>>}
>>
>>event only succeeds the initialization in the assigned cpu. I see no
>
>Confused here - The above code check (cpumask_test_cpu) is removed in 
>this patch. Are you explaining how it was behaving before this patch?

yes. This is to explain that the scope is system-wide and not core-wide.
The confusion came because our hotplug handling  in i915 is using the
wrong mask to migrate the event, which led to the question "is this
counter really system-wide if it's doing that on migration?"

>
>>differences in results (using i915/interrupts/ since freq is harder to
>>compare):
>>
>>$ sudo perf stat -e i915/interrupts/  sleep 1
>>
>>Performance counter stats for 'system wide':
>>
>>              253      i915/interrupts/
>>
>>      1.002215175 seconds time elapsed
>>
>>$ sudo perf stat -a  -e i915/interrupts/  sleep 1
>>
>>Performance counter stats for 'system wide':
>>
>>              251      i915/interrupts/
>>
>>      1.000900818 seconds time elapsed
>>
>>Note that our cpumask attr already returns just the assigned cpu and
>
>I don't see the cpumask attr anymore since this patch remove that, so 
>still confused on this part.

cpumask attr is now added by core perf infra. See how pmu_dev_attrs
is handled in kernel/events/core.c. If you load the driver with this
patch you will still have a cpumask attr in sysfs and the value depends
on what scope you give it.

The validation when creating and event (with perf_event_open) also moves
to core: it calls pmu->event_init() and then validates the cpu:

kernel/events/core.c:
	perf_try_init_event() {
		ret = pmu->event_init(event);
		...

		if (pmu->scope != PERF_PMU_SCOPE_NONE && event->cpu >= 0) {
			// check if the cpu matches to mask for that
			// scope
		}
	}

>
>>perf-stat only tries to open on that cpu:
>>
>>$ strace --follow -s 1024 -e perf_event_open --  perf stat -a  -e i915/interrupts/  sleep 1
>>
>>[pid 55777] perf_event_open({type=0x24 /* PERF_TYPE_??? */, size=0x88 /* PERF_ATTR_SIZE_??? */, config=0x100002, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = 3
>>
>>Lucas De Marchi
>>
>>>But this patch claims that the counter is system-wide. With the patch,
>>>the same perf command should only read the counter on the assigned CPU.
>>>
>>>Please also post the test results in the changelog. That's the reason
>>>why the scope has to be changed.
>>
>>it seems that migration code is simply wrong, not that we are changing
>>the scope here - it was already considered system-wide. I can add a
>>paragraph in the commit message explaining it.
>
>The prior code was enforcing one CPU assignment to all the i915 
>events.  If the event was read from some other CPU it would fail 
>(based on this check in event initialization).
>
>	if (!cpumask_test_cpu(event->cpu, &i915_pmu_cpumask))
>		return -EINVAL;
>
>That's not the case anymore. Right? If yes, how do counters read from 

see above, the validation moved to perf core, after the event_init. If
it doesn't match, we cget a call to event->destroy() and it returns
-ENODEV to userspace

>different CPUs get reported to the user? Sum of all counts on 
>different CPUs?

value is still the same. There's still only one valid CPU and that CPU
is always cpu 0 in our x86 case.

I hope this clarifies.

Lucas De Marchi

>
>Thanks,
>Umesh
>
>>
>>thanks
>>Lucas De Marchi
>>
>>>
>>>Thanks,
>>>Kan
>>>
>>>