[RFC] drm/msm: Add UABI to request perfcntr usage
Rob Clark
robdclark at gmail.com
Mon Dec 16 20:51:01 UTC 2024
On Mon, Dec 16, 2024 at 12:25 PM Akhil P Oommen
<quic_akhilpo at quicinc.com> wrote:
>
> On 12/16/2024 10:28 PM, Connor Abbott wrote:
> > On Mon, Dec 16, 2024 at 11:55 AM Akhil P Oommen
> > <quic_akhilpo at quicinc.com> wrote:
> >>
> >> On 12/13/2024 10:40 PM, Antonino Maniscalco wrote:
> >>> On 12/13/24 5:50 PM, Akhil P Oommen wrote:
> >>>> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
> >>>>> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
> >>>>>> On 12/5/2024 10:24 PM, Rob Clark wrote:
> >>>>>>> From: Rob Clark <robdclark at chromium.org>
> >>>>>>>
> >>>>>>> Performance counter usage falls into two categories:
> >>>>>>>
> >>>>>>> 1. Local usage, where the counter configuration, start, and end read
> >>>>>>> happen within (locally to) a single SUBMIT. In this case,
> >>>>>>> there is
> >>>>>>> no dependency on counter configuration or values between submits,
> >>>>>>> and
> >>>>>>> in fact counters are normally cleared on context switches,
> >>>>>>> making it
> >>>>>>> impossible to rely on cross-submit state.
> >>>>>>>
> >>>>>>> 2. Global usage, where a single privilaged daemon/process is sampling
> >>>>>>> counter values across all processes for profiling.
> >>>>>>>
> >>>>>>> The two categories are mutually exclusive. While you can have many
> >>>>>>> processes making local counter usage, you cannot combine global and
> >>>>>>> local usage without the two stepping on each others feet (by changing
> >>>>>>> counter configuration).
> >>>>>>>
> >>>>>>> For global counter usage, there is already a SYSPROF param (since
> >>>>>>> global
> >>>>>>> counter usage requires disabling counter clearing on context switch).
> >>>>>>> This patch adds a REQ_CNTRS param to request local counter usage. If
> >>>>>>> one or more processes has requested counter usage, then a SYSPROF
> >>>>>>> request will fail with -EBUSY. And if SYSPROF is active, then
> >>>>>>> REQ_CNTRS
> >>>>>>> will fail with -EBUSY, maintaining the mutual exclusivity.
> >>>>>>>
> >>>>>>> This is purely an advisory interface to help coordinate userspace.
> >>>>>>> There is no real means of enforcement, but the worst that can
> >>>>>>> happen if
> >>>>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
> >>>>>>> profiling data.
> >>>>>>>
> >>>>>>> Signed-off-by: Rob Clark <robdclark at chromium.org>
> >>>>>>> ---
> >>>>>>> kgsl takes a different approach, which involves a lot more UABI for
> >>>>>>> assigning counters to different processes. But I think by taking
> >>>>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
> >>>>>>> counters they need in each SUBMIT, for their respective gl/vk perf-
> >>>>>>> counter extensions, we can take this simpler approach.
> >>>>>>
> >>>>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
> >>>>>> that will come up in future generations). How will we ensure that here?
> >>>>>>
> >>>>>> I have plans to bring up IFPC support in near future. Also, I
> >>>>>> brought up
> >>>>>> this point during preemption series. But from the responses, I felt
> >>>>>> that
> >>>>>> profiling was not considered a serious usecase. Still I wonder how the
> >>>>>> perfcounter extensions work accurately with preemption.
> >>>>>
> >>>>> So back then I implemented the postamble IB to clear perf counters and
> >>>>> that gets disabled when sysprof (so global usage) is happening. The
> >>>>> kernel is oblivious to "Local isage" of profiling but in that case
> >>>>> really what we want to do is disable preemption which in my
> >>>>> understanding can be done from userspace with a PKT. In my understanding
> >>>>> this had us covered for all usecases.
> >>>>
> >>>> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
> >>>> you mean?
> >>>
> >>> Ah, I thought it wasmentioned, sorry.
> >>> The packet I was referring to is:
> >>> <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </
> >>> doc>
> >>> <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/>
> >>
> >> Ah! Okay. I think this packet is not used by the downstream blob. IMO,
> >> disabling preemption is still a suboptimal solution.
> >
> > Downstream doesn't expose userspace perfcounters (i.e.
> > VK_KHR_performance_query) at all. My understanding is that Android
> > requires you not to expose them for security reasons, but I could be
> > wrong there. In any case, Qualcomm clearly hasn't really thought
> > through what it would take to make everything work well with userspace
> > perfcounters and hasn't implemented the necessary firmware bits for
> > that, so we're left muddling through and doing what we can.
> >
>
> Honestly, I don't know what you meant by the missing support.
GMU support to save/restore SEL regs on IFPC when SYSPROF is enabled ;-)
If we had that, we wouldn't need ioclts to assign+configure counters,
everything else could be done in userspace (modulo trying to do both
local and global profiling at the same time.. but I'm not convinced
that is a valid use-case, as I mentioned earlier)
BR,
-R
> -Akhil.
>
> > Connor
> >
> >>
> >>>
> >>> BTW you mentioned wanting to look into IFPC. Since I too wanted to look
> >>> into implementing it wonder if you could let me know when you planned on
> >>> working on it.
> >>
> >> I have few patches in progress. Nothing final yet and need verification
> >> on the hw side. Also, I need to do some housekeeping here to debug gmu
> >> issues since IFPC increases the probability of those a lot.
> >>
> >> I will try to send out the patches very soon.
> >>
> >> -Akhil.
> >>
> >>>
> >>>>
> >>>> -Akhil.
> >>>>
> >>>>>
> >>>>> So what would you expect instead we should do kernel side to make
> >>>>> profiling preemption safe?
> >>>>>
> >>>>>>
> >>>>>> -Akhil
> >>>>>>
> >>>>>>>
> >>>>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +
> >>>>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++-
> >>>>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 +
> >>>>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++-
> >>>>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++
> >>>>>>> +++++-
> >>>>>>> include/uapi/drm/msm_drm.h | 1 +
> >>>>>>> 6 files changed, 85 insertions(+), 5 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
> >>>>>>> drm/msm/adreno/adreno_gpu.c
> >>>>>>> index 31bbf2c83de4..f688e37059b8 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
> >>>>>>> msm_file_private *ctx,
> >>>>>>> if (!capable(CAP_SYS_ADMIN))
> >>>>>>> return UERR(EPERM, drm, "invalid permissions");
> >>>>>>> return msm_file_private_set_sysprof(ctx, gpu, value);
> >>>>>>> + case MSM_PARAM_REQ_CNTRS:
> >>>>>>> + return msm_file_private_request_counters(ctx, gpu, value);
> >>>>>>> default:
> >>>>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
> >>>>>>>> name, param);
> >>>>>>> }
> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
> >>>>>>> msm_drv.c
> >>>>>>> index 6416d2cb4efc..bf8314ff4a25 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/msm_drv.c
> >>>>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >>>>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
> >>>>>>> *dev, struct drm_file *file)
> >>>>>>> * It is not possible to set sysprof param to non-zero if gpu
> >>>>>>> * is not initialized:
> >>>>>>> */
> >>>>>>> - if (priv->gpu)
> >>>>>>> + if (ctx->sysprof)
> >>>>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0);
> >>>>>>> + if (ctx->counters_requested)
> >>>>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0);
> >>>>>>> +
> >>>>>>> context_close(ctx);
> >>>>>>> }
> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
> >>>>>>> msm_gpu.c
> >>>>>>> index 82f204f3bb8f..013b59ca3bb1 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
> >>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> >>>>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
> >>>>>>> platform_device *pdev,
> >>>>>>> gpu->nr_rings = nr_rings;
> >>>>>>> refcount_set(&gpu->sysprof_active, 1);
> >>>>>>> + refcount_set(&gpu->local_counters_active, 1);
> >>>>>>> return 0;
> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
> >>>>>>> msm_gpu.h
> >>>>>>> index e25009150579..83c61e523b1b 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
> >>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> >>>>>>> @@ -195,12 +195,28 @@ struct msm_gpu {
> >>>>>>> int nr_rings;
> >>>>>>> /**
> >>>>>>> - * sysprof_active:
> >>>>>>> + * @sysprof_active:
> >>>>>>> *
> >>>>>>> - * The count of contexts that have enabled system profiling.
> >>>>>>> + * The count of contexts that have enabled system profiling plus
> >>>>>>> one.
> >>>>>>> + *
> >>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
> >>>>>>> keep
> >>>>>>> + * the under/overflow checks that refcount_t provides, but allow
> >>>>>>> + * multiple on/off transitions so we track the logical value
> >>>>>>> plus one.)
> >>>>>>> */
> >>>>>>> refcount_t sysprof_active;
> >>>>>>> + /**
> >>>>>>> + * @local_counters_active:
> >>>>>>> + *
> >>>>>>> + * The count of contexts that have requested local (intra-submit)
> >>>>>>> + * performance counter usage plus one.
> >>>>>>> + *
> >>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
> >>>>>>> keep
> >>>>>>> + * the under/overflow checks that refcount_t provides, but allow
> >>>>>>> + * multiple on/off transitions so we track the logical value
> >>>>>>> plus one.)
> >>>>>>> + */
> >>>>>>> + refcount_t local_counters_active;
> >>>>>>> +
> >>>>>>> /**
> >>>>>>> * lock:
> >>>>>>> *
> >>>>>>> @@ -383,6 +399,13 @@ struct msm_file_private {
> >>>>>>> */
> >>>>>>> int sysprof;
> >>>>>>> + /**
> >>>>>>> + * @counters_requested:
> >>>>>>> + *
> >>>>>>> + * Has the context requested local perfcntr usage.
> >>>>>>> + */
> >>>>>>> + bool counters_requested;
> >>>>>>> +
> >>>>>>> /**
> >>>>>>> * comm: Overridden task comm, see MSM_PARAM_COMM
> >>>>>>> *
> >>>>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
> >>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
> >>>>>>> struct msm_gpu *gpu, int sysprof);
> >>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
> >>>>>>> + struct msm_gpu *gpu, int reqcntrs);
> >>>>>>> void __msm_file_private_destroy(struct kref *kref);
> >>>>>>> static inline void msm_file_private_put(struct msm_file_private
> >>>>>>> *ctx)
> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
> >>>>>>> msm/msm_submitqueue.c
> >>>>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> >>>>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> >>>>>>> @@ -10,6 +10,15 @@
> >>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
> >>>>>>> struct msm_gpu *gpu, int sysprof)
> >>>>>>> {
> >>>>>>> + int ret = 0;
> >>>>>>> +
> >>>>>>> + mutex_lock(&gpu->lock);
> >>>>>>> +
> >>>>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) >
> >>>>>>> 1)) {
> >>>>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
> >>>>>>> + goto out_unlock;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> /*
> >>>>>>> * Since pm_runtime and sysprof_active are both refcounts, we
> >>>>>>> * call apply the new value first, and then unwind the previous
> >>>>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
> >>>>>>> msm_file_private *ctx,
> >>>>>>> switch (sysprof) {
> >>>>>>> default:
> >>>>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d",
> >>>>>>> sysprof);
> >>>>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
> >>>>>>> + goto out_unlock;
> >>>>>>> case 2:
> >>>>>>> pm_runtime_get_sync(&gpu->pdev->dev);
> >>>>>>> fallthrough;
> >>>>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
> >>>>>>> msm_file_private *ctx,
> >>>>>>> ctx->sysprof = sysprof;
> >>>>>>> - return 0;
> >>>>>>> +out_unlock:
> >>>>>>> + mutex_unlock(&gpu->lock);
> >>>>>>> +
> >>>>>>> + return ret;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
> >>>>>>> + struct msm_gpu *gpu, int reqctrs)
> >>>>>>> +{
> >>>>>>> + int ret = 0;
> >>>>>>> +
> >>>>>>> + mutex_lock(&gpu->lock);
> >>>>>>> +
> >>>>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
> >>>>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active");
> >>>>>>> + goto out_unlock;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + if (reqctrs) {
> >>>>>>> + if (ctx->counters_requested) {
> >>>>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested");
> >>>>>>> + goto out_unlock;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + ctx->counters_requested = true;
> >>>>>>> + refcount_inc(&gpu->local_counters_active);
> >>>>>>> + } else {
> >>>>>>> + if (!ctx->counters_requested) {
> >>>>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested");
> >>>>>>> + goto out_unlock;
> >>>>>>> + }
> >>>>>>> + refcount_dec(&gpu->local_counters_active);
> >>>>>>> + ctx->counters_requested = false;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> +out_unlock:
> >>>>>>> + mutex_unlock(&gpu->lock);
> >>>>>>> +
> >>>>>>> + return ret;
> >>>>>>> }
> >>>>>>> void __msm_file_private_destroy(struct kref *kref)
> >>>>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> >>>>>>> index 2342cb90857e..ae7fb355e4a1 100644
> >>>>>>> --- a/include/uapi/drm/msm_drm.h
> >>>>>>> +++ b/include/uapi/drm/msm_drm.h
> >>>>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
> >>>>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
> >>>>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
> >>>>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
> >>>>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-
> >>>>>>> submit) perfcntr usage */
> >>>>>>> /* For backwards compat. The original support for preemption was
> >>>>>>> based on
> >>>>>>> * a single ring per priority level so # of priority levels equals
> >>>>>>> the #
> >>>>>>
> >>>>>
> >>>>>
> >>>>> Best regards,
> >>>>
> >>>
> >>>
> >>> Best regards,
> >>
>
More information about the Freedreno
mailing list