[PATCH 2/3] drm/panfrost: Expose HW counters to userspace

Boris Brezillon boris.brezillon at collabora.com
Thu Apr 4 18:17:17 UTC 2019


On Thu, 4 Apr 2019 08:41:29 -0700
Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:


> > +/*
> > + * Returns true if the 2 jobs have exactly the same perfcnt context, false
> > + * otherwise.
> > + */
> > +static bool panfrost_perfcnt_job_ctx_cmp(struct panfrost_perfcnt_job_ctx *a,
> > +					 struct panfrost_perfcnt_job_ctx *b)
> > +{
> > +	unsigned int i, j;
> > +
> > +	if (a->perfmon_count != b->perfmon_count)
> > +		return false;
> > +
> > +	for (i = 0; i < a->perfmon_count; i++) {
> > +		for (j = 0; j < b->perfmon_count; j++) {
> > +			if (a->perfmons[i] == b->perfmons[j])
> > +				break;
> > +		}
> > +
> > +		if (j == b->perfmon_count)
> > +			return false;
> > +	}
> > +  
> 
> Would using memcmp() be cleaner here?

memcmp() does not account for the case where 2 jobs contain exactly the
same perfmons but in a different order. This being said, it's rather
unlikely to happen, so maybe we can accept the perf penalty for that
case.

> 
> > +	if (panfrost_model_cmp(pfdev, 0x1000) >= 0)  
> 
> What does 0x1000 refer to here? I'm assuming maybe Bifrost, but it's not
> obvious... probably better to have a #define somewhere and use that (or
> an enum equivalently).

Yes, all numbers above 0xfff are bifrost GPUs. I'll add a macro.

> 
> > +	/*
> > +	 * Due to PRLAM-8186 we need to disable the Tiler before we enable HW
> > +	 * counters.
> > +	 */  
> 
> What on earth is PRLAM-8186? :)
> 
> Actually, wait, I can answer that -- old kbase versions had an errata
> list:
> 
>         /* Write of PRFCNT_CONFIG_MODE_MANUAL to PRFCNT_CONFIG causes a instrumentation dump if
>            PRFCNT_TILER_EN is enabled */
>         BASE_HW_ISSUE_8186,
> 
> So that's why. If people want, I'm considering moving these errata
> descriptions back into the kernel where possible, since otherwise code
> like this is opaque.

Will copy the errata.

> 
> > +		unsigned int nl2c, ncores;
> > +
> > +		/*
> > +		 * TODO: define a macro to extract the number of l2 caches from
> > +		 * mem_features.
> > +		 */
> > +		nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1;
> > +
> > +		/*
> > +		 * The ARM driver is grouping cores per core group and then
> > +		 * only using the number of cores in group 0 to calculate the
> > +		 * size. Not sure why this is done like that, but I guess
> > +		 * shader_present will only show cores in the first group
> > +		 * anyway.
> > +		 */
> > +		ncores = hweight64(pfdev->features.shader_present);
> > +  
> 
> Deja vu. Was this copypaste dmaybe?

Actually, that one is from me, hence the 'not sure why' part :).

> 
> > +		  (panfrost_model_cmp(pfdev, 0x1000) >= 0 ?  
> 
> THere's that pesky 0x1000 again.
> 
> > @@ -55,6 +63,15 @@ struct drm_panfrost_submit {
> >  
> >  	/** A combination of PANFROST_JD_REQ_* */
> >  	__u32 requirements;
> > +
> > +	/** Pointer to a u32 array of perfmons that should be attached to the job. */
> > +	__u64 perfmon_handles;
> > +
> > +	/** Number of perfmon handles passed in (size is that times 4). */
> > +	__u32 perfmon_handle_count;
> > +
> > +	/** Unused field, should be set to 0. */
> > +	__u32 padding;  
> 
> Bleep blorp. If we're modifying _submit, we'll need to be swift about
> merging this ahead of the main code to make sure we don't break the
> UABI. Although I guess if we're just adding fields at the end, that's a
> nonissue.

Others are using the same "if data passed is smaller than expected
size, unassigned fields are zeroed". That allows us to extend a struct
without breaking the ABI as long as zero is a valid value and does not
change the behavior compared to when the field was not present.

This is the case here: perfmon_handle_count = 0 means no perfmon
attached to the job, so the driver is acting like it previously was.

No need to get that part merged in the initial patch series IMO.

> 
> > +struct drm_panfrost_block_perfcounters {
> > +	/*
> > +	 * For DRM_IOCTL_PANFROST_GET_PERFCNT_LAYOUT, encodes the available
> > +	 * instances for a specific given block type.
> > +	 * For DRM_IOCTL_PANFROST_CREATE_PERFMON, encodes the instances the
> > +	 * user wants to monitor.
> > +	 * Note: the bitmap might be sparse.
> > +	 */
> > +	__u64 instances;
> > +
> > +	/*
> > +	 * For DRM_IOCTL_PANFROST_GET_PERFCNT_LAYOUT, encodes the available
> > +	 * counters attached to a specific block type.
> > +	 * For DRM_IOCTL_PANFROST_CREATE_PERFMON, encodes the counters the user
> > +	 * wants to monitor.
> > +	 * Note: the bitmap might be sparse.
> > +	 */
> > +	__u64 counters;
> > +};  
> 
> I don't understand this. Aren't there more than 64 counters?
> 
> > +struct drm_panfrost_get_perfcnt_layout {
> > +	struct drm_panfrost_block_perfcounters counters[PANFROST_NUM_BLOCKS];
> > +};  
> 
> --Oh. It's per-block. Got it.
> 
> > + * Used to create a performance monitor. Each perfmonance monitor is assigned an  
> 
> Typo.

Will fix.

> 
> ---
> 
> Overall, this looks really great! Thank you! :)

Thanks a lot for your reviews. That was pretty damn fast!


More information about the dri-devel mailing list