[PATCH v6 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information

Harish Chegondi harish.chegondi at intel.com
Wed Dec 18 23:24:18 UTC 2024


On Tue, Dec 17, 2024 at 12:07:49PM -0800, Dixit, Ashutosh wrote:
> On Tue, 17 Dec 2024 01:46:56 -0800, Harish Chegondi wrote:
> >
> 
Hi Ashutosh,

> Hi Harish,
> 
> Only reviewing the uapi, not the implementation.
> 
> > User space can get the EU stall data record size, EU stall capabilities,
> > EU stall sampling rates, and per XeCore buffer size with query IOCTL
> > DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
> > A struct drm_xe_query_eu_stall will be returned to the user space along
> > with an array of supported sampling rates sorted in the fastest sampling
> > rate first order. sampling_rates in struct drm_xe_query_eu_stall will
> > point to the array of sampling rates.
> >
> > Any capabilities in EU stall sampling as of this patch are considered
> > as base capabilities. Any new capabilities added later will need
> > a new capabilities flag.
> 
> s/a new capabilities flag/new capability bits/. Or, better to say something
> like "New capability bits will be added for any new functionality added
> later".
> 
> See OA capability bits in the latest drm-tip.
Will check and change.
> 
> >
> > v6: Include EU stall sampling rates information and
> >     per XeCore buffer size in the query information.
> 
> /snip/
> 
> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index 4ee3b04a1bb5..40c2d274473e 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -700,6 +700,7 @@ struct drm_xe_device_query {
> >  #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
> >  #define DRM_XE_DEVICE_QUERY_UC_FW_VERSION	7
> >  #define DRM_XE_DEVICE_QUERY_OA_UNITS		8
> > +#define DRM_XE_DEVICE_QUERY_EU_STALL		9
> >	/** @query: The type of data to query */
> >	__u32 query;
> >
> > @@ -1770,6 +1771,40 @@ enum drm_xe_eu_stall_property_id {
> >	DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT,
> >  };
> >
> > +/**
> > + * struct drm_xe_query_eu_stall - Information about EU stall sampling.
> > + *
> > + * If a query is made with a struct @drm_xe_device_query where .query
> > + * is equal to @DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses
> > + * struct @drm_xe_query_eu_stall in .data.
> > + */
> > +struct drm_xe_query_eu_stall {
> > +	/** @extensions: Pointer to the first extension struct, if any */
> > +	__u64 extensions;
> > +
> > +	/** @capabilities: EU stall capabilities bit-mask */
> > +	__u64 capabilities;
> > +#define DRM_XE_EU_STALL_CAPS_BASE		(1 << 0)
> > +
> > +	/** @record_size: size of each EU stall data record */
> > +	__u64 record_size;
> > +
> > +	/** @per_xecore_buf_size: Per XeCore buffer size */
> > +	__u64 per_xecore_buf_size;
> 
> This new member has appeared. What is its purpose and how will userspace
> use it? Basically how is it relevant?
This is the per XeCore buffer size which will be used to create the per
GT EU stall data buffer. In the earlier patch, user space configured the
per Xecore size with one of the valid values - 128K, 256K or 512K. But
it was removed from the uAPI and instead the driver uses the biggest
size - 512K and creates the EU stall data buffer with size:
512K x number of XeCores. User space requested the per XeCore buffer
sized be exposed through the query IOCTL. In the future, I need to add
support for this buffer size to be able to change it via the debugfs
> 
> > +
> > +	/** @num_sampling_rates: Number of sampling rates supported */
> > +	__u64 num_sampling_rates;
> > +
> > +	/**
> > +	 * @sampling_rates: Pointer to an array of sampling rates
> > +	 * sorted in the fastest sampling rate first order.
> 
> sorted from fastest to slowest.
> 
> > +	 */
> > +	__u64 *sampling_rates;
> 
> This should be written as a flexible array as follows:
I used a pointer instead of a flexible array as I didn't want any field
after reserved fields. But since flexible array fields don't have any
storage space allocated unlike pointer, I will go ahead and change it to
a flexible array and move it to the end of the struct in the next patch
series.
> 
> 	__u64 sampling_rates[];
> 
> So sampling rate array is just present at the end of the struct, there
> should be no separate pointer here in the struct.
> 
> 	https://en.wikipedia.org/wiki/Flexible_array_member
> 
> Also we need to document what units the sampling rates are in. So something
> like "Sampling rates are specified in number of cycles of the reference
> clock".
> 
> So we have decided not to use nanoseconds (so sampling period rather than
> sampling rate)? Any particular reason for that? Though I am ok either way.

I decided to use GPU cycles instead of nanoseconds as using GPU cycles
will be more future proof to the uAPI. I also received feedback that
there are other interfaces in the driver that use GPU cycles instead of
nanoseconds, and therefore to be consistent, using GPU cycles may
better.
> 
> > +
> > +	/** @reserved: Reserved */
> > +	__u64 reserved[5];
> 
> Because flexible arrays must be the last field in a structure, this
> reserved field should be moved before num_sampling_rates field.
Will move to the end of the structure in the next patch series.
> 
> > +};
> > +
> >  #if defined(__cplusplus)
> >  }
> >  #endif
> > --
> > 2.47.0
> >
> 
> Ashutosh
Thanks for the review.
Harish.



More information about the Intel-xe mailing list