[PATCH] drm/i915/pmu: Drop custom hotplug code

Thu Feb 13 14:28:56 UTC 2025

On 13/02/2025 13:36, Liang, Kan wrote:
> On 2025-02-12 11:13 p.m., Lucas De Marchi wrote:
>> On Tue, Jan 21, 2025 at 12:19:15PM -0500, Liang, Kan wrote:
>>>
>>>
>>> On 2025-01-21 11:59 a.m., Lucas De Marchi wrote:
>>>> On Tue, Jan 21, 2025 at 10:53:31AM -0500, Liang, Kan wrote:
>>>>>
>>>>>
>>>>> On 2025-01-21 9:29 a.m., Lucas De Marchi wrote:
>>>>>> On Mon, Jan 20, 2025 at 08:42:41PM -0500, Liang, Kan wrote:
>>>>>>>>>> -static int i915_pmu_cpu_offline(unsigned int cpu, struct
>>>>>>>>>> hlist_node
>>>>>>>>>> *node)
>>>>>>>>>> -{
>>>>>>>>>> -    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu),
>>>>>>>>>> cpuhp.node);
>>>>>>>>>> -    unsigned int target = i915_pmu_target_cpu;
>>>>>>>>>> -
>>>>>>>>>> -    /*
>>>>>>>>>> -     * Unregistering an instance generates a CPU offline event
>>>>>>>>>> which
>>>>>>>>>> we must
>>>>>>>>>> -     * ignore to avoid incorrectly modifying the shared
>>>>>>>>>> i915_pmu_cpumask.
>>>>>>>>>> -     */
>>>>>>>>>> -    if (!pmu->registered)
>>>>>>>>>> -        return 0;
>>>>>>>>>> -
>>>>>>>>>> -    if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
>>>>>>>>>> -        target = cpumask_any_but(topology_sibling_cpumask(cpu),
>>>>>>>>>> cpu);
>>>>>>>>>> -
>>>>>>>>>
>>>>>>>>> I'm not familar with the i915 PMU, but it seems suggest a core
>>>>>>>>> scope
>>>>>>>>> PMU, not a system-wide scope.
>>>>>>>>
>>>>>>>> counter is in a complete separate device - it doesn't depend on
>>>>>>>> core or
>>>>>>>> die or pkg - not sure why it cared about topology_sibling_cpumask
>>>>>>>> here.
>>>>>>>
>>>>>>> OK. But it's still a behavior change. Please make it clear in the
>>>>>>> description that the patch also changes/fixes the scope from core
>>>>>>> scope
>>>>>>> to system-wide.
>>>>>>
>>>>>> sure... do you have a suggestion how to test the hotplug? For testing
>>>>>> purposes, can I force the perf cpu assigned to be something other than
>>>>>> the cpu0?
>>>>>
>>>>> Yes, it's a bit tricky to verify the hotplug if the assigned CPU is
>>>>> CPU0. I don't know a way to force another CPU without changing the
>>>>> code.
>>>>> You may have to instrument the code for the test.
>>>>>
>>>>> Another test you may want to do is the perf system-wide test, e.g.,
>>>>> perf
>>>>> stat -a -e i915/actual-frequency/ sleep 1.
>>>>>
>>>>> The existing code assumes the counter is core scope. So the result
>>>>> should be huge, since perf will read the counter on each core and add
>>>>> them up.
>>>>
>>>> that is not allowed and it simply fails to init the counter:
>>>>
>>>> static int i915_pmu_event_init(struct perf_event *event)
>>>>      ...
>>>>      if (event->cpu < 0)
>>>>          return -EINVAL;
>>>>      if (!cpumask_test_cpu(event->cpu, &i915_pmu_cpumask))
>>>>          return -EINVAL;
>>>>      ...
>>>> }
>>>>
>>>> event only succeeds the initialization in the assigned cpu. I see no
>>>> differences in results (using i915/interrupts/ since freq is harder to
>>>> compare):
>>>>
>>>> $ sudo perf stat -e i915/interrupts/  sleep 1
>>>>
>>>>   Performance counter stats for 'system wide':
>>>>
>>>>                 253      i915/
>>>> interrupts/
>>>>         1.002215175 seconds time elapsed
>>>>
>>>> $ sudo perf stat -a  -e i915/interrupts/  sleep 1
>>>>
>>>>   Performance counter stats for 'system wide':
>>>>
>>>>                 251      i915/
>>>> interrupts/
>>>>         1.000900818 seconds time elapsed
>>>>
>>>> Note that our cpumask attr already returns just the assigned cpu and
>>>> perf-stat only tries to open on that cpu:
>>>>
>>>> $ strace --follow -s 1024 -e perf_event_open --  perf stat -a  -e i915/
>>>> interrupts/  sleep 1
>>>>
>>>> [pid 55777] perf_event_open({type=0x24 /* PERF_TYPE_??? */, size=0x88 /*
>>>> PERF_ATTR_SIZE_??? */, config=0x100002, sample_period=0,
>>>> sample_type=PERF_SAMPLE_IDENTIFIER,
>>>> read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|
>>>> PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, precise_ip=0 /*
>>>> arbitrary skid */, exclude_guest=1, ...}, -1, 0, -1,
>>>> PERF_FLAG_FD_CLOEXEC) = 3
>>>>
>>>
>>> I see. The behavior is not changed with the patch. It should be just the
>>
>> humn... the behavior doesn't change when using perf because perf will
>> read the cpumask and use it accordingly. However apparently now it's not
>> working anymore to reject calls to perf_event_open() that have a cpu
>> that doesn't match the cpumask.
>>
>> Just like before I have this output:
>>
>> $ sudo cat /sys/devices/i915/cpumask 0
>>
>> However if perf_event_open() is called with cpu == 1, it succeeds.
>> Example:
>>
>>      attr_init(&attr);
>>      perf_event_open(&attr, -1, 1, -1, 0);
>>
>> I was expecting it to fail and set errno to ENODEV, but that is not the
>> case. For this particular system I'm seeing these values in
>> perf_try_init_event():
>>
>>      event->cpu == 1
>>      cpumask=0-19
>>      pmu_cpumask=0
>>
>> Re-reading this: it will accept any (online) cpu of the system. Same
>> behavior occurs with other scopes: any cpu of that scope is accepted and
>> event->cpu will still keep what the user passed in (rather than the
>> calculated by perf_try_init_event(). Is that expected?
> 
> Yes, for a system-wide event, it can be read from any CPU. The CPU mask
> in the sysfs only tells the perf tool that only 1 CPU is required to get
> system-wide information. It doesn't have to be the advised CPU. It can
> be any CPU in the scope.
> 
> https://lore.kernel.org/all/20240802151643.1691631-3-kan.liang@linux.intel.com/

I was asking about this during review - will it also allow for group 
reads to mix cpus and if yes are there any downsides etc?

Regards,

Tvrtko