[PATCH v4] drm/xe/pmu: Add GT frequency events
Belgaumkar, Vinay
vinay.belgaumkar at intel.com
Wed Mar 26 22:02:49 UTC 2025
On 3/25/2025 9:14 PM, Lucas De Marchi wrote:
> On Tue, Mar 25, 2025 at 11:09:55PM -0500, Lucas De Marchi wrote:
>> On Tue, Mar 25, 2025 at 03:45:06PM -0700, Ashutosh Dixit wrote:
>>> On Tue, 25 Mar 2025 15:01:47 -0700, Belgaumkar, Vinay wrote:
>>>>
>>>>
>>>> On 3/25/2025 2:53 PM, Dixit, Ashutosh wrote:
>>>>> On Tue, 25 Mar 2025 13:33:43 -0700, Belgaumkar, Vinay wrote:
>>>>>> On 3/25/2025 10:15 AM, Dixit, Ashutosh wrote:
>>>>>>> On Mon, 24 Mar 2025 19:37:32 -0700, Belgaumkar, Vinay wrote:
>>>>>>> Hi Vinay,
>>>>>>>
>>>>>>>> On 3/24/2025 5:18 PM, Dixit, Ashutosh wrote:
>>>>>>>>> On Mon, 24 Mar 2025 16:24:02 -0700, Vinay Belgaumkar wrote:
>>>>>>>>>> @@ -266,11 +274,24 @@ static u64 __xe_pmu_event_read(struct
>>>>>>>>>> perf_event *event)
>>>>>>>>>> case XE_PMU_EVENT_ENGINE_ACTIVE_TICKS:
>>>>>>>>>> case XE_PMU_EVENT_ENGINE_TOTAL_TICKS:
>>>>>>>>>> return read_engine_events(gt, event);
>>>>>>>>>> + case XE_PMU_EVENT_GT_ACTUAL_FREQUENCY:
>>>>>>>>>> + return xe_guc_pc_get_act_freq(>->uc.guc.pc);
>>>>>>>>>> + case XE_PMU_EVENT_GT_REQUESTED_FREQUENCY:
>>>>>>>>>> + if (!xe_guc_pc_get_cur_freq(>->uc.guc.pc,
>>>>>>>>>> &cur_gt_freq))
>>>>>>>>> This is unconditionally taking the forcewake and waking the
>>>>>>>>> card up just to
>>>>>>>>> get the sample. Do we really want to do that?
>>>>>>>>>
>>>>>>>>> So if we don't do that, both the actual and requested freq
>>>>>>>>> will be 0 if gt
>>>>>>>>> is in C6.
>>>>>>>> For actual frequency, the register(0xc60) does not belong to
>>>>>>>> any fw domain -
>>>>>>>>
>>>>>>>> GEN_FW_RANGE(0xc00, 0xfff, 0),
>>>>>>>>
>>>>>>>> HW will report 0 when GT is in C6.
>>>>>>> Yes, no issue about act_freq, see commit 22009b6dad66. I was
>>>>>>> referring only
>>>>>>> to requested freq.
>>>>>>>
>>>>>>>> The requested freq register is a
>>>>>>>> shadowed register (0xa008), so that will not accrue fwake either.
>>>>>>>>
>>>>>>>> static const struct i915_range mtl_shadowed_regs[] = {
>>>>>>>> { .start = 0x2030, .end = 0x2030 },
>>>>>>>> { .start = 0x2510, .end = 0x2550 },
>>>>>>>> { .start = 0xA008, .end = 0xA00C },
>>>>>>> So this still doesn't make sense because:
>>>>>>>
>>>>>>> 1. The fact is that xe_guc_pc_get_cur_freq() *is* taking
>>>>>>> forcewake
>>>>>>> 2. And that is in accord with the following comment in
>>>>>>> i915/intel_uncore.c
>>>>>>>
>>>>>>> * Shadowing only applies to writes; forcewake
>>>>>>> * must still be acquired when reading from registers in
>>>>>>> these ranges.
>>>>>>>
>>>>>>> Also see intel_rps_read_punit_req() which is called from i915 PMU
>>>>>>> (frequency_sample()) and uses with_intel_runtime_pm_if_in_use(),
>>>>>>> so we'd
>>>>>>> need to do use the equivalent in xe.
>>>>>> Hi Ashutosh,
>>>>>>
>>>>>> As part of a previous decision, in the Xe PMU implementation,
>>>>>> we are
>>>>>> doing a runtime_get() during pmu_init for all PMU sessions. So,
>>>>>> device is
>>>>>> going to be awake anyways. In this case, it does not make sense
>>>>>> to just
>>>>>> read the register without a fwake.
>>>>>
>>>>> Even if that is ok... Let us take a completely idle device. What
>>>>> should the
>>>>> actual and requested freq's be for this case? Both should be zero,
>>>>> correct?
>>>>> Now, what are we going to show if we are taking fwake? At least the
>>>>> requested freq will be non-zero? Is that ok? Or the requested freq
>>>>> will
>>>>> show zero even if we take fwake? Thanks.
>>>>
>>>> If we take fwake before reading, requested frequency will not be
>>>> zero even
>>>> if GT is idle. HW always retains the last requested frequency in that
>>>> register, never zeroes it like it does for actual_freq. So, showing
>>>> the
>>>> non-zero value is the correct thing to do instead of artifically
>>>> zeroing
>>>> it.
>>>
>>> So now, this I disagree with. For an idle device, hopefully in C6,
>>> there
>>> should be no reason to request a non-zero freq. So the requested freq
>>> should show 0. As i915 does (using gt parked and unparked states and
>>> using
>>> with_intel_runtime_pm_if_in_use()).
>>
>> wrt locking and interaction with runtime_pm and forcewake, the perf
>> implementation in i915 is completely broken. You can't take a runtime_pm
>> or forcewake when you are in an atomic context. And when you have a
>> raw_spin_lock taken like is the case with perf, you can't even look at
>> it (if it involves taking a spin_lock, which it does).
>>
>> perf is holding a raw_spin_lock at that time. The only reason why the
>> CI isn't on fire over there is because we are papering it over with this
>> commit in topic/core-for-CI:
>>
>> dc17a830ffb5 ("Revert "lockdep: Enable PROVE_RAW_LOCK_NESTING with
>> PROVE_LOCKING."")
>>
>>>
>>> If we are already "doing a runtime_get() during pmu_init" (as mentioned
>>> above) we need to devise some other way of making this happen (maybe by
>>> looking at xe_force_wake_domain ref field?).
>>
>> you can't do that safely.
>
> which reminds me... Vinay can you test this patch directly on
> drm-xe-next? By code inspection it seems we are doing:
>
> __xe_pmu_event_read()
> xe_guc_pc_get_cur_freq()
> xe_force_wake_get()
> spin_lock_irqsave(&fw->lock, flags);
>
> and that spin_lock can't be called at this point.
Hi Lucas,
Tried it on drm-xe-next, not seeing any deadlocks/warnings. Is that
because CONFIG_PREEMPT is not set? Doesn't look like upstream config has
it set either -
https://gitlab.freedesktop.org/drm/xe/ci/-/blob/main/kernel/kconfig?ref_type=heads
Thanks,
Vinay.
>
> Lucas De Marchi
>
>>
>> Lucas De Marchi
>>
>>>
>>> Let us wait for a second opinion on this. Thanks.
More information about the Intel-xe
mailing list