[RESEND PATCH v2 2/2] drm/panfrost: Expose perf counters through debugfs
Boris Brezillon
boris.brezillon at collabora.com
Tue May 28 13:53:48 UTC 2019
Hi Steve,
On Thu, 16 May 2019 17:21:39 +0100
Steven Price <steven.price at arm.com> wrote:
> >> +static void panfrost_perfcnt_setup(struct panfrost_device *pfdev)
> >> +{
> >> + u32 cfg;
> >> +
> >> + /*
> >> + * Always use address space 0 for now.
> >> + * FIXME: this needs to be updated when we start using different
> >> + * address space.
> >> + */
> >> + cfg = GPU_PERFCNT_CFG_AS(0);
> >> + if (panfrost_model_cmp(pfdev, 0x1000) >= 0)
> >
> > You've got a couple of these. Perhaps we should add either a
> > model_is_bifrost() helper or an is_bifrost variable to use.
> >
> >> + cfg |= GPU_PERFCNT_CFG_SETSEL(1);
>
> mali_kbase only sets this if CONFIG_MALI_PRFCNT_SET_SECONDARY is defined
> - i.e. this selects a different selection of counters. At at the very
> least this should be an optional flag that can be set, but I suspect the
> primary set are the ones you are interested in.
I'll add a param to pass the set of counters to monitor.
>
> >> +
> >> + gpu_write(pfdev, GPU_PERFCNT_CFG,
> >> + cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> >> +
> >> + if (!pfdev->perfcnt->enabled)
> >> + return;
> >> +
> >> + gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0xffffffff);
> >> + gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0xffffffff);
> >> + gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0xffffffff);
> >> +
> >> + /*
> >> + * Due to PRLAM-8186 we need to disable the Tiler before we enable HW
> >> + * counters.
> >
> > Seems like you could just always apply the work-around? It doesn't
> > look like it would be much different.
>
> Technically the workaround causes the driver to perform the operation in
> the wrong order below (write TILER_EN when the mode is MANUAL) - this is
> a workaround for the hardware issue (and therefore works on that
> hardware). But I wouldn't like to say it works on other hardware which
> doesn't have the issue.
Okay, I'll keep the workaround only for HW that are impacted by the bug.
>
> >> + */
> >> + if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
> >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> >> + else
> >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
> >> +
> >> + gpu_write(pfdev, GPU_PERFCNT_CFG,
> >> + cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_MANUAL));
> >> +
> >> + if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
> >> + gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
> >> +}
[...]
> >> +int panfrost_perfcnt_init(struct panfrost_device *pfdev)
> >> +{
> >> + struct panfrost_perfcnt *perfcnt;
> >> + struct drm_gem_shmem_object *bo;
> >> + size_t size;
> >> + u32 status;
> >> + int ret;
> >> +
> >> + if (panfrost_has_hw_feature(pfdev, HW_FEATURE_V4)) {
> >> + unsigned int ncoregroups;
> >> +
> >> + ncoregroups = hweight64(pfdev->features.l2_present);
> >> + size = ncoregroups * BLOCKS_PER_COREGROUP *
> >> + COUNTERS_PER_BLOCK * BYTES_PER_COUNTER;
> >> + } else {
> >> + unsigned int nl2c, ncores;
> >> +
> >> + /*
> >> + * TODO: define a macro to extract the number of l2 caches from
> >> + * mem_features.
> >> + */
> >> + nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1;
> >> +
> >> + /*
> >> + * The ARM driver is grouping cores per core group and then
> >> + * only using the number of cores in group 0 to calculate the
> >> + * size. Not sure why this is done like that, but I guess
> >> + * shader_present will only show cores in the first group
> >> + * anyway.
> >> + */
>
> I'm not sure I understand this comment. I'm not sure which version of
> the driver you are looking at, but shader_present should contain all the
> cores (in every core group).
>
> The kbase driver (in panfrost's git tree[1]), contains:
>
> base_gpu_props *props = &kbdev->gpu_props.props;
> u32 nr_l2 = props->l2_props.num_l2_slices;
> u64 core_mask = props->coherency_info.group[0].core_mask;
>
> u32 nr_blocks = fls64(core_mask);
>
> /* JM and tiler counter blocks are always present */
> dump_size = (2 + nr_l2 + nr_blocks) *
> NR_CNT_PER_BLOCK *
> NR_BYTES_PER_CNT;
>
> [1]
> https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_vinstr.c
>
> i.e. the number of cores + number of L2 caches (plus 2 for JM/tiler)
> multiplied by the block size.
Actually, I was referring to what's done in
kbase_gpuprops_construct_coherent_groups() [2].
>
> Also another difference with the below is that fls64 and hweight64 don't
> return the same numbers. If the core mask is non-contiguous this will
> return different values.
Right, I should use fls64() not hweight64().
Regards,
Boris
[2]https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_gpuprops.c#L71
More information about the dri-devel
mailing list