[PATCH v6 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information

Dixit, Ashutosh ashutosh.dixit at intel.com
Thu Dec 19 20:19:49 UTC 2024


On Thu, 19 Dec 2024 12:15:42 -0800, Dixit, Ashutosh wrote:
>
> On Thu, 19 Dec 2024 12:04:40 -0800, Harish Chegondi wrote:
> >
> > On Thu, Dec 19, 2024 at 08:36:16AM -0800, Dixit, Ashutosh wrote:
> > > On Wed, 18 Dec 2024 15:24:18 -0800, Harish Chegondi wrote:
> > > >
> > > > On Tue, Dec 17, 2024 at 12:07:49PM -0800, Dixit, Ashutosh wrote:
> > > > > On Tue, 17 Dec 2024 01:46:56 -0800, Harish Chegondi wrote:
> > > > > >
> > > > >
> > > > Hi Ashutosh,
> > > >
> > > > > Hi Harish,
> > > > >
> > > > > Only reviewing the uapi, not the implementation.
> > > > >
> > > > > > User space can get the EU stall data record size, EU stall capabilities,
> > > > > > EU stall sampling rates, and per XeCore buffer size with query IOCTL
> > > > > > DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
> > > > > > A struct drm_xe_query_eu_stall will be returned to the user space along
> > > > > > with an array of supported sampling rates sorted in the fastest sampling
> > > > > > rate first order. sampling_rates in struct drm_xe_query_eu_stall will
> > > > > > point to the array of sampling rates.
> > > > > >
> > > > > > Any capabilities in EU stall sampling as of this patch are considered
> > > > > > as base capabilities. Any new capabilities added later will need
> > > > > > a new capabilities flag.
> > > > >
> > > > > s/a new capabilities flag/new capability bits/. Or, better to say something
> > > > > like "New capability bits will be added for any new functionality added
> > > > > later".
> > > > >
> > > > > See OA capability bits in the latest drm-tip.
> > > > Will check and change.
> > > > >
> > > > > >
> > > > > > v6: Include EU stall sampling rates information and
> > > > > >     per XeCore buffer size in the query information.
> > > > >
> > > > > /snip/
> > > > >
> > > > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > > > index 4ee3b04a1bb5..40c2d274473e 100644
> > > > > > --- a/include/uapi/drm/xe_drm.h
> > > > > > +++ b/include/uapi/drm/xe_drm.h
> > > > > > @@ -700,6 +700,7 @@ struct drm_xe_device_query {
> > > > > >  #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
> > > > > >  #define DRM_XE_DEVICE_QUERY_UC_FW_VERSION	7
> > > > > >  #define DRM_XE_DEVICE_QUERY_OA_UNITS		8
> > > > > > +#define DRM_XE_DEVICE_QUERY_EU_STALL		9
> > > > > >	/** @query: The type of data to query */
> > > > > >	__u32 query;
> > > > > >
> > > > > > @@ -1770,6 +1771,40 @@ enum drm_xe_eu_stall_property_id {
> > > > > >	DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT,
> > > > > >  };
> > > > > >
> > > > > > +/**
> > > > > > + * struct drm_xe_query_eu_stall - Information about EU stall sampling.
> > > > > > + *
> > > > > > + * If a query is made with a struct @drm_xe_device_query where .query
> > > > > > + * is equal to @DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses
> > > > > > + * struct @drm_xe_query_eu_stall in .data.
> > > > > > + */
> > > > > > +struct drm_xe_query_eu_stall {
> > > > > > +	/** @extensions: Pointer to the first extension struct, if any */
> > > > > > +	__u64 extensions;
> > > > > > +
> > > > > > +	/** @capabilities: EU stall capabilities bit-mask */
> > > > > > +	__u64 capabilities;
> > > > > > +#define DRM_XE_EU_STALL_CAPS_BASE		(1 << 0)
> > > > > > +
> > > > > > +	/** @record_size: size of each EU stall data record */
> > > > > > +	__u64 record_size;
> > > > > > +
> > > > > > +	/** @per_xecore_buf_size: Per XeCore buffer size */
> > > > > > +	__u64 per_xecore_buf_size;
> > > > >
> > > > > This new member has appeared. What is its purpose and how will userspace
> > > > > use it? Basically how is it relevant?
> > > > This is the per XeCore buffer size which will be used to create the per
> > > > GT EU stall data buffer. In the earlier patch, user space configured the
> > > > per Xecore size with one of the valid values - 128K, 256K or 512K. But
> > > > it was removed from the uAPI and instead the driver uses the biggest
> > > > size - 512K and creates the EU stall data buffer with size:
> > > > 512K x number of XeCores. User space requested the per XeCore buffer
> > > > sized be exposed through the query IOCTL. In the future, I need to add
> > > > support for this buffer size to be able to change it via the debugfs
> > >
> > > In my view, Xe is only a temporary branding name whereas our uapi is
> > > eternal :) Kidding, but you get the idea. So I would call this by a more
> > > general name such as per_core_buf_size if it indeed needs to exposed (so I
> > > prefer removing Xe from the name).
> > From an earlier review feedback, I changed all DSS (Dual Sub Slice) to
> > xecore as xecore is the new term for DSS or sub slice as we used to call
> > it. So, there are "xecore" strings in several parts of the code in
> > xe_eu_stall.c. So, I am not sure if I should change xecore to just core
> > in just this variable as it may lead to confusion about core vs xecore.
>
> Let's have @Roper, Matthew D take this on. Internally we can do whatever we
> want but in the uapi header we should be careful.
>
> > >
> > > Because I don't see what userspace will do with this. I don't even know if
> > > we can expose it if we cannot demonstrate that UMD's are consuming it. Also
> > > we can reintroduce the removed property in case it is useful (which also
> > > seems questionable).
> > L0 folks have asked for the buffer size to be exposed. Only then I added
> > this field to the uAPI.
> > I have plan to allow the user to change the per subslice buffer size
> > through debugfs entry in the future. Although the driver uses the
> > biggest buffer size, in debugging and pre-silicon environments, it helps
> > to have smaller buffer sizes. So, in the future I will add support for
> > the user to change the buffer size.
>
> So if there's a real use case for it, I would just introduce a property
> later on (with a capability bit), rather than side-channel ways like
> debugfs. So if we are going to introduce a property anyway, we probably
> don't need per_xecore_buf_size in the query.

Sorry, we might still need it since you will likely have a default size if
the property is not specified. So better to sort this out.

>
> > >
> > > Anyway, for now I would keep the field but just the name to
> > > per_core_buf_size. But let's see if anyone else has an opinion on this.
> > >
> > >
> > > > >
> > > > > > +
> > > > > > +	/** @num_sampling_rates: Number of sampling rates supported */
> > > > > > +	__u64 num_sampling_rates;
> > > > > > +
> > > > > > +	/**
> > > > > > +	 * @sampling_rates: Pointer to an array of sampling rates
> > > > > > +	 * sorted in the fastest sampling rate first order.
> > > > >
> > > > > sorted from fastest to slowest.
> > > > >
> > > > > > +	 */
> > > > > > +	__u64 *sampling_rates;
> > > > >
> > > > > This should be written as a flexible array as follows:
> > > > I used a pointer instead of a flexible array as I didn't want any field
> > > > after reserved fields. But since flexible array fields don't have any
> > > > storage space allocated unlike pointer, I will go ahead and change it to
> > > > a flexible array and move it to the end of the struct in the next patch
> > > > series.
> > > > >
> > > > >	__u64 sampling_rates[];
> > > > >
> > > > > So sampling rate array is just present at the end of the struct, there
> > > > > should be no separate pointer here in the struct.
> > > > >
> > > > >	https://en.wikipedia.org/wiki/Flexible_array_member
> > > > >
> > > > > Also we need to document what units the sampling rates are in. So something
> > > > > like "Sampling rates are specified in number of cycles of the reference
> > > > > clock".
> > > > >
> > > > > So we have decided not to use nanoseconds (so sampling period rather than
> > > > > sampling rate)? Any particular reason for that? Though I am ok either way.
> > > >
> > > > I decided to use GPU cycles instead of nanoseconds as using GPU cycles
> > > > will be more future proof to the uAPI. I also received feedback that
> > > > there are other interfaces in the driver that use GPU cycles instead of
> > > > nanoseconds, and therefore to be consistent, using GPU cycles may
> > > > better.
> > > > >
> > > > > > +
> > > > > > +	/** @reserved: Reserved */
> > > > > > +	__u64 reserved[5];
> > > > >
> > > > > Because flexible arrays must be the last field in a structure, this
> > > > > reserved field should be moved before num_sampling_rates field.
> > > > Will move to the end of the structure in the next patch series.
> > > > >
> > > > > > +};
> > > > > > +
> > > > > >  #if defined(__cplusplus)
> > > > > >  }
> > > > > >  #endif
> > > > > > --
> > > > > > 2.47.0


More information about the Intel-xe mailing list