[Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters

Mon May 15 10:10:56 UTC 2023

On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote:
> On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/05/2023 02:08, Dixit, Ashutosh wrote:
>>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
>>>>
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Reserve some bits in the counter config namespace which will carry the
>>>> tile id and prepare the code to handle this.
>>>>
>>>> No per tile counters have been added yet.
>>>>
>>>> v2:
>>>> - Fix checkpatch issues
>>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
>>>> - Set MAX GTs to 4. Drop FIXME.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>

8<

>>>> +static u64 frequency_enabled_mask(void)
>>>> +{
>>>> +    unsigned int i;
>>>> +    u64 mask = 0;
>>>> +
>>>> +    for (i = 0; i < I915_PMU_MAX_GTS; i++)
>>>> +        mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
>>>> +            config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
>>>> +
>>>> +    return mask;
>>>> +}
>>>> +
>>>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
>>>>  {
>>>>     struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), 
>>>> pmu);
>>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu 
>>>> *pmu, bool gpu_active)
>>>>      * Mask out all the ones which do not need the timer, or in
>>>>      * other words keep all the ones that could need the timer.
>>>>      */
>>>> -    enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
>>>> -          config_mask(I915_PMU_REQUESTED_FREQUENCY) |
>>>> -          ENGINE_SAMPLE_MASK;
>>>> +    enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
>>>
>>> u32 enable & u64 frequency_enabled_mask
>>>
>>> ugly but ok I guess? Or change enable to u64?
> 
> making pmu->enable u64 as well as other places where it is assigned to 
> local variables.
> 
>>
>> Hmm.. yes very ugly. Could have been an accident which happened to 
>> work because there is a single timer (not per tile).
> 
> Happened to work because the frequency mask does not spill over to the 
> upper 32 bits (even for multi tile).
> 
> --------------------- START_SECTION ----------------
>>
>> Similar issue in frequency_sampling_enabled too. Gt_id argument to it 
>> seems pointless.
> 
> Not sure why it's pointless. We need the gt_id to determine the right 
> mask for that specific gt. If it's not enabled, then we just return 
> without pm_get and async put (like you mention later).
> And this piece of code is called within for_each_gt.

I think I got a little confused cross referencing the code and patches 
last week and did not mentally see all the changes.

Because the hunk in other_bit() is correctly adding support for per gt bits.

The layout of pmu->enable ends up like this:

bits  0 -  2: engine events
bits  3 -  5: gt0 other events
bits  6 -  8: gt1 other events
bits  9 - 11: gt2 other events
bits 12 - 14: gt3 other events

>> So I now think whole frequency_enabled_mask() is just pointless and 
>> should be removed. And then pmu_needs_time code can stay as is. 
>> Possibly add a config_mask_32 helper which ensures no bits in upper 32 
>> bits are returned.
>>
>> That is if we are happy for the frequency_sampling_enabled returning 
>> true for all gts, regardless of which ones actually have frequency 
>> sampling enabled.
>>
>> Or if we want to implement it as I probably have intended, we will 
>> need to add some gt bits into pmu->enable. Maybe reserve top four same 
>> as with config counters.
> 
> Nope. What you have here works just fine. pmu->enable should not include 
> any gt id info. gt_id[63:60] is only a concept for pmu config sent by 
> user.  config_mask and pmu->enable are i915 internal bookkeeping (bit 
> masks) just to track what events need to be sampled.  The 'other' bit 
> masks are a function of gt_id because we use gt_id to calculate a 
> contiguous numerical value for these 'other' events. That's all. Once 
> the numerical value is calculated, there is no need for gt_id because 
> config_mask is BIT_ULL(numerical_value). Since the numerical values 
> never exceeded 31 (even for multi-gts), everything worked even with 32 
> bit pmu->enable.

Yep.

So question then is why make pmu->enable u64?

Instead frequency_enabled_mask() should be made u32 since the bitwise or 
composition of config_masks() is guaranteed to fit.

At most it can have an internal u64 for the mask, assert upper_32_bits 
are zero and return lower_32_bits.

Did I get it right this time round? :)

Regards,

Tvrtko