[PATCH v6 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information

Harish Chegondi harish.chegondi at intel.com
Thu Dec 19 20:04:40 UTC 2024


On Thu, Dec 19, 2024 at 08:36:16AM -0800, Dixit, Ashutosh wrote:
> On Wed, 18 Dec 2024 15:24:18 -0800, Harish Chegondi wrote:
> >
> > On Tue, Dec 17, 2024 at 12:07:49PM -0800, Dixit, Ashutosh wrote:
> > > On Tue, 17 Dec 2024 01:46:56 -0800, Harish Chegondi wrote:
> > > >
> > >
> > Hi Ashutosh,
> >
> > > Hi Harish,
> > >
> > > Only reviewing the uapi, not the implementation.
> > >
> > > > User space can get the EU stall data record size, EU stall capabilities,
> > > > EU stall sampling rates, and per XeCore buffer size with query IOCTL
> > > > DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
> > > > A struct drm_xe_query_eu_stall will be returned to the user space along
> > > > with an array of supported sampling rates sorted in the fastest sampling
> > > > rate first order. sampling_rates in struct drm_xe_query_eu_stall will
> > > > point to the array of sampling rates.
> > > >
> > > > Any capabilities in EU stall sampling as of this patch are considered
> > > > as base capabilities. Any new capabilities added later will need
> > > > a new capabilities flag.
> > >
> > > s/a new capabilities flag/new capability bits/. Or, better to say something
> > > like "New capability bits will be added for any new functionality added
> > > later".
> > >
> > > See OA capability bits in the latest drm-tip.
> > Will check and change.
> > >
> > > >
> > > > v6: Include EU stall sampling rates information and
> > > >     per XeCore buffer size in the query information.
> > >
> > > /snip/
> > >
> > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > index 4ee3b04a1bb5..40c2d274473e 100644
> > > > --- a/include/uapi/drm/xe_drm.h
> > > > +++ b/include/uapi/drm/xe_drm.h
> > > > @@ -700,6 +700,7 @@ struct drm_xe_device_query {
> > > >  #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
> > > >  #define DRM_XE_DEVICE_QUERY_UC_FW_VERSION	7
> > > >  #define DRM_XE_DEVICE_QUERY_OA_UNITS		8
> > > > +#define DRM_XE_DEVICE_QUERY_EU_STALL		9
> > > >	/** @query: The type of data to query */
> > > >	__u32 query;
> > > >
> > > > @@ -1770,6 +1771,40 @@ enum drm_xe_eu_stall_property_id {
> > > >	DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT,
> > > >  };
> > > >
> > > > +/**
> > > > + * struct drm_xe_query_eu_stall - Information about EU stall sampling.
> > > > + *
> > > > + * If a query is made with a struct @drm_xe_device_query where .query
> > > > + * is equal to @DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses
> > > > + * struct @drm_xe_query_eu_stall in .data.
> > > > + */
> > > > +struct drm_xe_query_eu_stall {
> > > > +	/** @extensions: Pointer to the first extension struct, if any */
> > > > +	__u64 extensions;
> > > > +
> > > > +	/** @capabilities: EU stall capabilities bit-mask */
> > > > +	__u64 capabilities;
> > > > +#define DRM_XE_EU_STALL_CAPS_BASE		(1 << 0)
> > > > +
> > > > +	/** @record_size: size of each EU stall data record */
> > > > +	__u64 record_size;
> > > > +
> > > > +	/** @per_xecore_buf_size: Per XeCore buffer size */
> > > > +	__u64 per_xecore_buf_size;
> > >
> > > This new member has appeared. What is its purpose and how will userspace
> > > use it? Basically how is it relevant?
> > This is the per XeCore buffer size which will be used to create the per
> > GT EU stall data buffer. In the earlier patch, user space configured the
> > per Xecore size with one of the valid values - 128K, 256K or 512K. But
> > it was removed from the uAPI and instead the driver uses the biggest
> > size - 512K and creates the EU stall data buffer with size:
> > 512K x number of XeCores. User space requested the per XeCore buffer
> > sized be exposed through the query IOCTL. In the future, I need to add
> > support for this buffer size to be able to change it via the debugfs
> 
> In my view, Xe is only a temporary branding name whereas our uapi is
> eternal :) Kidding, but you get the idea. So I would call this by a more
> general name such as per_core_buf_size if it indeed needs to exposed (so I
> prefer removing Xe from the name).
>From an earlier review feedback, I changed all DSS (Dual Sub Slice) to
xecore as xecore is the new term for DSS or sub slice as we used to call
it. So, there are "xecore" strings in several parts of the code in
xe_eu_stall.c. So, I am not sure if I should change xecore to just core
in just this variable as it may lead to confusion about core vs xecore.
> 
> Because I don't see what userspace will do with this. I don't even know if
> we can expose it if we cannot demonstrate that UMD's are consuming it. Also
> we can reintroduce the removed property in case it is useful (which also
> seems questionable).
L0 folks have asked for the buffer size to be exposed. Only then I added
this field to the uAPI.
I have plan to allow the user to change the per subslice buffer size
through debugfs entry in the future. Although the driver uses the
biggest buffer size, in debugging and pre-silicon environments, it helps
to have smaller buffer sizes. So, in the future I will add support for
the user to change the buffer size.
> 
> Anyway, for now I would keep the field but just the name to
> per_core_buf_size. But let's see if anyone else has an opinion on this.
> 
> 
> > >
> > > > +
> > > > +	/** @num_sampling_rates: Number of sampling rates supported */
> > > > +	__u64 num_sampling_rates;
> > > > +
> > > > +	/**
> > > > +	 * @sampling_rates: Pointer to an array of sampling rates
> > > > +	 * sorted in the fastest sampling rate first order.
> > >
> > > sorted from fastest to slowest.
> > >
> > > > +	 */
> > > > +	__u64 *sampling_rates;
> > >
> > > This should be written as a flexible array as follows:
> > I used a pointer instead of a flexible array as I didn't want any field
> > after reserved fields. But since flexible array fields don't have any
> > storage space allocated unlike pointer, I will go ahead and change it to
> > a flexible array and move it to the end of the struct in the next patch
> > series.
> > >
> > >	__u64 sampling_rates[];
> > >
> > > So sampling rate array is just present at the end of the struct, there
> > > should be no separate pointer here in the struct.
> > >
> > >	https://en.wikipedia.org/wiki/Flexible_array_member
> > >
> > > Also we need to document what units the sampling rates are in. So something
> > > like "Sampling rates are specified in number of cycles of the reference
> > > clock".
> > >
> > > So we have decided not to use nanoseconds (so sampling period rather than
> > > sampling rate)? Any particular reason for that? Though I am ok either way.
> >
> > I decided to use GPU cycles instead of nanoseconds as using GPU cycles
> > will be more future proof to the uAPI. I also received feedback that
> > there are other interfaces in the driver that use GPU cycles instead of
> > nanoseconds, and therefore to be consistent, using GPU cycles may
> > better.
> > >
> > > > +
> > > > +	/** @reserved: Reserved */
> > > > +	__u64 reserved[5];
> > >
> > > Because flexible arrays must be the last field in a structure, this
> > > reserved field should be moved before num_sampling_rates field.
> > Will move to the end of the structure in the next patch series.
> > >
> > > > +};
> > > > +
> > > >  #if defined(__cplusplus)
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.47.0
> > > >
> > >
> > > Ashutosh
> > Thanks for the review.
> > Harish.
> >


More information about the Intel-xe mailing list