[PATCH v4] drm/xe/pmu: Add GT frequency events

Lucas De Marchi lucas.demarchi at intel.com
Wed Mar 26 15:34:18 UTC 2025


On Wed, Mar 26, 2025 at 08:02:52AM -0700, Ashutosh Dixit wrote:
>On Tue, 25 Mar 2025 21:14:53 -0700, Lucas De Marchi wrote:
>>
>> On Tue, Mar 25, 2025 at 11:09:55PM -0500, Lucas De Marchi wrote:
>> > On Tue, Mar 25, 2025 at 03:45:06PM -0700, Ashutosh Dixit wrote:
>> >> On Tue, 25 Mar 2025 15:01:47 -0700, Belgaumkar, Vinay wrote:
>> >>>
>> >>>
>> >>> On 3/25/2025 2:53 PM, Dixit, Ashutosh wrote:
>> >>>> On Tue, 25 Mar 2025 13:33:43 -0700, Belgaumkar, Vinay wrote:
>> >>>>> On 3/25/2025 10:15 AM, Dixit, Ashutosh wrote:
>> >>>>>> On Mon, 24 Mar 2025 19:37:32 -0700, Belgaumkar, Vinay wrote:
>> >>>>>> Hi Vinay,
>> >>>>>>
>> >>>>>>> On 3/24/2025 5:18 PM, Dixit, Ashutosh wrote:
>> >>>>>>>> On Mon, 24 Mar 2025 16:24:02 -0700, Vinay Belgaumkar wrote:
>> >>>>>>>>> @@ -266,11 +274,24 @@ static u64 __xe_pmu_event_read(struct perf_event *event)
>> >>>>>>>>>	case XE_PMU_EVENT_ENGINE_ACTIVE_TICKS:
>> >>>>>>>>>	case XE_PMU_EVENT_ENGINE_TOTAL_TICKS:
>> >>>>>>>>>		return read_engine_events(gt, event);
>> >>>>>>>>> +	case XE_PMU_EVENT_GT_ACTUAL_FREQUENCY:
>> >>>>>>>>> +		return xe_guc_pc_get_act_freq(&gt->uc.guc.pc);
>> >>>>>>>>> +	case XE_PMU_EVENT_GT_REQUESTED_FREQUENCY:
>> >>>>>>>>> +		if (!xe_guc_pc_get_cur_freq(&gt->uc.guc.pc, &cur_gt_freq))
>> >>>>>>>> This is unconditionally taking the forcewake and waking the card up just to
>> >>>>>>>> get the sample. Do we really want to do that?
>> >>>>>>>>
>> >>>>>>>> So if we don't do that, both the actual and requested freq will be 0 if gt
>> >>>>>>>> is in C6.
>> >>>>>>> For actual frequency, the register(0xc60) does not belong to any fw domain -
>> >>>>>>>
>> >>>>>>> GEN_FW_RANGE(0xc00, 0xfff, 0),
>> >>>>>>>
>> >>>>>>> HW will report 0 when GT is in C6.
>> >>>>>> Yes, no issue about act_freq, see commit 22009b6dad66. I was referring only
>> >>>>>> to requested freq.
>> >>>>>>
>> >>>>>>> The requested freq register is a
>> >>>>>>> shadowed register (0xa008), so that will not accrue fwake either.
>> >>>>>>>
>> >>>>>>> static const struct i915_range mtl_shadowed_regs[] = {
>> >>>>>>>           { .start =   0x2030, .end =   0x2030 },
>> >>>>>>>           { .start =   0x2510, .end =   0x2550 },
>> >>>>>>>           { .start =   0xA008, .end =   0xA00C },
>> >>>>>> So this still doesn't make sense because:
>> >>>>>>
>> >>>>>>     1. The fact is that xe_guc_pc_get_cur_freq() *is* taking forcewake
>> >>>>>>     2. And that is in accord with the following comment in i915/intel_uncore.c
>> >>>>>>
>> >>>>>>        * Shadowing only applies to writes; forcewake
>> >>>>>>        * must still be acquired when reading from registers in these ranges.
>> >>>>>>
>> >>>>>> Also see intel_rps_read_punit_req() which is called from i915 PMU
>> >>>>>> (frequency_sample()) and uses with_intel_runtime_pm_if_in_use(), so we'd
>> >>>>>> need to do use the equivalent in xe.
>> >>>>> Hi Ashutosh,
>> >>>>>
>> >>>>>    As part of a previous decision, in the Xe PMU implementation, we are
>> >>>>> doing a runtime_get() during pmu_init for all PMU sessions. So, device is
>> >>>>> going to be awake anyways. In this case, it does not make sense to just
>> >>>>> read the register without a fwake.
>> >>>>
>> >>>> Even if that is ok... Let us take a completely idle device. What should the
>> >>>> actual and requested freq's be for this case? Both should be zero, correct?
>> >>>> Now, what are we going to show if we are taking fwake? At least the
>> >>>> requested freq will be non-zero? Is that ok? Or the requested freq will
>> >>>> show zero even if we take fwake? Thanks.
>> >>>
>> >>> If we take fwake before reading, requested frequency will not be zero even
>> >>> if GT is idle. HW always retains the last requested frequency in that
>> >>> register, never zeroes it like it does for actual_freq. So, showing the
>> >>> non-zero value is the correct thing to do instead of artifically zeroing
>> >>> it.
>> >>
>> >> So now, this I disagree with. For an idle device, hopefully in C6, there
>> >> should be no reason to request a non-zero freq. So the requested freq
>> >> should show 0. As i915 does (using gt parked and unparked states and using
>> >> with_intel_runtime_pm_if_in_use()).
>> >
>> > wrt locking and interaction with runtime_pm and forcewake, the perf
>> > implementation in i915 is completely broken. You can't take a runtime_pm
>> > or forcewake when you are in an atomic context. And when you have a
>> > raw_spin_lock taken like is the case with perf, you can't even look at
>> > it (if it involves taking a spin_lock, which it does).
>> >
>> > perf is holding a raw_spin_lock at that time. The only reason why the
>> > CI isn't on fire over there is because we are papering it over with this
>> > commit in topic/core-for-CI:
>> >
>> > dc17a830ffb5 ("Revert "lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING."")
>> >
>> >>
>> >> If we are already "doing a runtime_get() during pmu_init" (as mentioned
>> >> above) we need to devise some other way of making this happen (maybe by
>> >> looking at xe_force_wake_domain ref field?).
>> >
>> > you can't do that safely.
>>
>> which reminds me... Vinay can you test this patch directly on
>> drm-xe-next? By code inspection it seems we are doing:
>>
>> __xe_pmu_event_read()
>>   xe_guc_pc_get_cur_freq()
>>     xe_force_wake_get()
>>       spin_lock_irqsave(&fw->lock, flags);
>>
>> and that spin_lock can't be called at this point.
>
>Hi Lucas, sorry, didn't follow why the spin_lock can't be taken at this
>point? Because we should be able to take a spin_lock in an atomic

you can, unless you have a raw_spin_lock taken.  With PREEMPT_RT
spinlocks can sleep, raw_spinlocks can´t.
See https://docs.kernel.org/locking/locktypes.html#spinning-locks,
https://docs.kernel.org/locking/locktypes.html#raw-spinlock-t-and-spinlock-t

The read call from the perf side is coming with a raw_spin_lock taken,
has always been and will not change. See
https://lore.kernel.org/intel-gfx/20241212093123.GV21636@noisy.programming.kicks-ass.net/

>context. The actual waiting for the domain to wake up also seems to be
>atomic friendly. Thanks.

are we talking i915 or xe here?

For xe, I'd say it's a hack-friendly way. Waiting up to 50msec like that
is not a design I like and we only get away with that because we don't
really wait that long unless the device is faulty.

For i915, it pretty much isn't as the intel_ref_tracker implementation
tries to allocate memory.

I've been going back and forth on the i915 side on how to fix it, but
there are multiple changes needed and AFAICS we will need to balance
some locks moving to raw_spin_lock and some other things moved around.

Lucas De Marchi

>
>>
>> >>
>> >> Let us wait for a second opinion on this. Thanks.


More information about the Intel-xe mailing list