[Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters

Mon May 15 22:07:29 UTC 2023

On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote:
>

Hi Tvrtko,

> On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote:
> > On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 12/05/2023 02:08, Dixit, Ashutosh wrote:
> >>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote:
> >>>>
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>
> >>>> Reserve some bits in the counter config namespace which will carry the
> >>>> tile id and prepare the code to handle this.
> >>>>
> >>>> No per tile counters have been added yet.
> >>>>
> >>>> v2:
> >>>> - Fix checkpatch issues
> >>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME.
> >>>> - Set MAX GTs to 4. Drop FIXME.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>
> 8<
>
> >>>> +static u64 frequency_enabled_mask(void)
> >>>> +{
> >>>> +    unsigned int i;
> >>>> +    u64 mask = 0;
> >>>> +
> >>>> +    for (i = 0; i < I915_PMU_MAX_GTS; i++)
> >>>> +        mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) |
> >>>> +            config_mask(__I915_PMU_REQUESTED_FREQUENCY(i));
> >>>> +
> >>>> +    return mask;
> >>>> +}
> >>>> +
> >>>>  static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active)
> >>>>  {
> >>>>     struct drm_i915_private *i915 = container_of(pmu, typeof(*i915),
> >>>> pmu);
> >>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu,
> >>>> bool gpu_active)
> >>>>      * Mask out all the ones which do not need the timer, or in
> >>>>      * other words keep all the ones that could need the timer.
> >>>>      */
> >>>> -    enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) |
> >>>> -          config_mask(I915_PMU_REQUESTED_FREQUENCY) |
> >>>> -          ENGINE_SAMPLE_MASK;
> >>>> +    enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK;
> >>>
> >>> u32 enable & u64 frequency_enabled_mask
> >>>
> >>> ugly but ok I guess? Or change enable to u64?
> >
> > making pmu->enable u64 as well as other places where it is assigned to
> > local variables.
> >
> >>
> >> Hmm.. yes very ugly. Could have been an accident which happened to work
> >> because there is a single timer (not per tile).
> >
> > Happened to work because the frequency mask does not spill over to the
> > upper 32 bits (even for multi tile).
> >
> > --------------------- START_SECTION ----------------
> >>
> >> Similar issue in frequency_sampling_enabled too. Gt_id argument to it
> >> seems pointless.
> >
> > Not sure why it's pointless. We need the gt_id to determine the right
> > mask for that specific gt. If it's not enabled, then we just return
> > without pm_get and async put (like you mention later).
> > And this piece of code is called within for_each_gt.
>
> I think I got a little confused cross referencing the code and patches last
> week and did not mentally see all the changes.
>
> Because the hunk in other_bit() is correctly adding support for per gt bits.
>
> The layout of pmu->enable ends up like this:
>
> bits  0 -  2: engine events
> bits  3 -  5: gt0 other events
> bits  6 -  8: gt1 other events
> bits  9 - 11: gt2 other events
> bits 12 - 14: gt3 other events

Correct.

>
> >> So I now think whole frequency_enabled_mask() is just pointless and
> >> should be removed. And then pmu_needs_time code can stay as is. Possibly
> >> add a config_mask_32 helper which ensures no bits in upper 32 bits are
> >> returned.
> >>
> >> That is if we are happy for the frequency_sampling_enabled returning
> >> true for all gts, regardless of which ones actually have frequency
> >> sampling enabled.
> >>
> >> Or if we want to implement it as I probably have intended, we will need
> >> to add some gt bits into pmu->enable. Maybe reserve top four same as
> >> with config counters.
> >
> > Nope. What you have here works just fine. pmu->enable should not include
> > any gt id info. gt_id[63:60] is only a concept for pmu config sent by
> > user.  config_mask and pmu->enable are i915 internal bookkeeping (bit
> > masks) just to track what events need to be sampled.  The 'other' bit
> > masks are a function of gt_id because we use gt_id to calculate a
> > contiguous numerical value for these 'other' events. That's all. Once the
> > numerical value is calculated, there is no need for gt_id because
> > config_mask is BIT_ULL(numerical_value). Since the numerical values never
> > exceeded 31 (even for multi-gts), everything worked even with 32 bit
> > pmu->enable.
>
> Yep.
>
> So question then is why make pmu->enable u64?

The only reason was simplicity, since a lot of the existing code already
assumes u64.

E.g. if we keep pmu->enable u32, we should have to do the following:

* Change config_mask() return type to u32 (in frequency_sampling_enabled(),
  we have 'pmu->enable & config_mask()')
* Change frequency_enabled_mask() return type to u32 (again uses
  config_mask() so if we change config_mask() to u32 we change return type
  here too)
* In i915_pmu_enable(), change 'pmu->enable |= BIT_ULL(bit)' to
  'pmu->enable |= BIT(bit)'

So yes, if we think we should be using pmu->enable u32, let's change this
to be consistent everywhere.

> Instead frequency_enabled_mask() should be made u32 since the bitwise or
> composition of config_masks() is guaranteed to fit.
>
> At most it can have an internal u64 for the mask, assert upper_32_bits are
> zero and return lower_32_bits.
>
> Did I get it right this time round? :)

Yes, though we'd have to make the config_mask() type change above to make
frequency_enabled_mask() u32. Or we can just keep everything u64. Let's
decide one way or the other and close this. It seems Tvrtko is leaning
towards making pmu->enable and frequency_enabled_mask() u32?

Thanks.
--
Ashutosh