[PATCH v2 1/1] drm/xe/eustall: Add support for EU stall sampling

Harish Chegondi harish.chegondi at intel.com
Fri Aug 30 20:31:14 UTC 2024


On Fri, Aug 30, 2024 at 08:58:17AM -0700, Cabral, Matias A wrote:
> Hi Santhosh, 
> 
> > Just to confirm, in case of buffer overflow 2 read() calls are expected to be done to read out the stall data ?
> Yes, this will be handled internally by the UMD. The user calling L0 won't know this happened under the hood. 
> 
> Hi Harish, 
> 
> >4. User space doesn't seem to be interested to know which subslices have dropped data. So, the driver would not provide any STATUS IOCTL to get this info.
> During the discussion the question came around the fact if in reality this can actually happen, since both slices are sampling at same frequency and active threads state do produce reports. You mentioned you would check with HW team. 
Hi Matias, I did check on this and I was told that different threads
have different cache hit/miss profiles and therefore generate different EU
stall data. So, in theory all subslices with EU stall data may not have
dropped data at the same time. But in reality this may not be the case.
This would be a good thing to check during testing.
Would user space be interested in knowing which subslices have dropped
data?
> 
> 
> ETA for this changes pushed upstream ? 
My plan is to make the uAPI changes and address other review comments
and push the next version of the patch by the end of next week.

Thanks
Harish.
> 
> Thanks, 
> _MAC
> 
> -----Original Message-----
> From: Ranjan, Joshua Santhosh <joshua.santosh.ranjan at intel.com> 
> Sent: Friday, August 30, 2024 1:24 AM
> To: Chegondi, Harish <harish.chegondi at intel.com>; Cabral, Matias A <matias.a.cabral at intel.com>
> Cc: Dixit, Ashutosh <ashutosh.dixit at intel.com>; Souza, Jose <jose.souza at intel.com>; intel-xe at lists.freedesktop.org; Degrood, Felix J <felix.j.degrood at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Kumar, Shubham <shubham.kumar at intel.com>; Ausmus, James <james.ausmus at intel.com>
> Subject: RE: [PATCH v2 1/1] drm/xe/eustall: Add support for EU stall sampling
> 
> Hi Harish, 
> 
> One clarification:
> > the driver would return an error during a read() if *any* subslice in the tile has dropped data.
> >Any EU stall data present in the kernel buffer would NOT be read.
> > The subsequent read() would return EU stall data for all subslices on the tile and also clear the drop bit in the HW registers for all subslices that dropped data.
> 
> Just to confirm, in case of buffer overflow 2 read() calls are expected to be done to read out the stall data ?
> 
> Thanks,
> Joshua Santhosh
> 
> -----Original Message-----
> From: Chegondi, Harish <harish.chegondi at intel.com>
> Sent: Friday, August 30, 2024 11:51 AM
> To: Cabral, Matias A <matias.a.cabral at intel.com>
> Cc: Dixit, Ashutosh <ashutosh.dixit at intel.com>; Souza, Jose <jose.souza at intel.com>; intel-xe at lists.freedesktop.org; Degrood, Felix J <felix.j.degrood at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Ranjan, Joshua Santhosh <joshua.santosh.ranjan at intel.com>; Kumar, Shubham <shubham.kumar at intel.com>; Ausmus, James <james.ausmus at intel.com>
> Subject: Re: [PATCH v2 1/1] drm/xe/eustall: Add support for EU stall sampling
> 
> Here is the summary of the discussion regarding the uAPI
> 
> 1. Eliminate the data header from the data copied by the driver to the user space.
> 
> 2. Subslice information in the header is NOT used by the user space since the data is collected at the tile granularity.
> 
> 3. The only flags bit(0) in the header currently used, is to indicate if the HW has dropped any EU stall data due to insufficient space in the kernel buffer. Instead of a flag in the header, the driver would return an error during a read() if *any* subslice in the tile has dropped data.
> Any EU stall data present in the kernel buffer would NOT be read.
> The subsequent read() would return EU stall data for all subslices on the tile and also clear the drop bit in the HW registers for all subslices that dropped data.
> 
> 4. User space doesn't seem to be interested to know which subslices have dropped data. So, the driver would not provide any STATUS IOCTL to get this info.
> 
> 5. Record size in the header is a static info which can be queried through an INFO IOCTL after a file descriptor is opened. Based on the GPU, user space can determine this as well.
> 
> Thanks
> Harish.
> 
> On Mon, Aug 26, 2024 at 10:31:04AM -0700, Cabral, Matias A wrote:
> > > Matias: could you please explain what L0 does with this dropped flag?
> > 
> > During the processing of the data, L0 returns a warning message. VTune ( I think) also warns the user that results were collected but will be inaccurate because the draining/reading of data was not done fast enough. By moving the warning to be returned at earlier/reading step, VTune may a) on the fly increase the reading frequency reducing the amount of data lost b) cancel the collection immediately, saving time to the user that may collect data in one node and process in a different one. 
> > 
> > Thanks,
> > _MAC
> > 
> > -----Original Message-----
> > From: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> > Sent: Monday, August 26, 2024 9:48 AM
> > To: Souza, Jose <jose.souza at intel.com>
> > Cc: Cabral, Matias A <matias.a.cabral at intel.com>; 
> > intel-xe at lists.freedesktop.org; Degrood, Felix J 
> > <felix.j.degrood at intel.com>; Nerlige Ramappa, Umesh 
> > <umesh.nerlige.ramappa at intel.com>; Ranjan, Joshua Santhosh 
> > <joshua.santosh.ranjan at intel.com>; Chegondi, Harish 
> > <harish.chegondi at intel.com>; Kumar, Shubham <shubham.kumar at intel.com>; 
> > Ausmus, James <james.ausmus at intel.com>
> > Subject: Re: [PATCH v2 1/1] drm/xe/eustall: Add support for EU stall 
> > sampling
> > 
> > On Fri, 23 Aug 2024 14:22:19 -0700, Souza, Jose wrote:
> > >
> > > Hi
> > 
> > Thanks Jose. One question for Matias/L0 below.
> > 
> > > On Thu, 2024-08-22 at 15:53 -0700, Dixit, Ashutosh wrote:
> > > > On Wed, 21 Aug 2024 12:35:51 -0700, Cabral, Matias A wrote:
> > > >
> > > > Hi Matias,
> > > >
> > > > Thanks for responding, the input is _very_ helpful.
> > > >
> > > > Mesa folks: would it be possible for you to provide similar input too?
> > >
> > > Felix's MR[1] is only using record_size and num_records, if the
> > > drm_xe_eu_stall_data_xe2 was the same size and the sample we would 
> > > not need the header at all, inline replies below.
> > >
> > > [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142
> > >
> > > >
> > > > Thanks.
> > > > --
> > > > Ashutosh
> > > >
> > > >
> > > > >
> > > > > Hi Ashutosh,
> > > > >
> > > > > Some inline questions below [MAC]
> > > > >
> > > > > Thanks,
> > > > > _MAC
> > > > >
> > > > > -----Original Message-----
> > > > > From: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> > > > > Sent: Friday, August 16, 2024 3:38 PM
> > > > > To: intel-xe at lists.freedesktop.org
> > > > > Cc: Chegondi, Harish <harish.chegondi at intel.com>; Nerlige 
> > > > > Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Degrood, Felix 
> > > > > J <felix.j.degrood at intel.com>; Souza, Jose 
> > > > > <jose.souza at intel.com>; Cabral, Matias A 
> > > > > <matias.a.cabral at intel.com>
> > > > > Subject: Re: [PATCH v2 1/1] drm/xe/eustall: Add support for EU 
> > > > > stall sampling
> > > > >
> > > > > On Sun, 07 Jul 2024 15:41:41 -0700, Ashutosh Dixit wrote:
> > > > >
> > > > > Hi Harish,
> > > > >
> > > > > Some comments below on just the uapi first, towards finalizing 
> > > > > the uapi with the UMD's who consume this data. And also 
> > > > > comparing the uapi with what we did in OA.
> > > > >
> > > > > >
> > > > > > diff --git a/include/uapi/drm/xe_drm.h 
> > > > > > b/include/uapi/drm/xe_drm.h index 19619d4952a8..343de700d10d
> > > > > > 100644
> > > > >
> > > > > /snip/
> > > > >
> > > > > > +/**
> > > > > > + * struct drm_xe_eu_stall_data_header - EU stall data header.
> > > > > > + * Header with additional information that the driver adds
> > > > > > + * before EU stall data of each subslice during read().
> > > > >
> > > > > One question to resolve is if we really need this header and if 
> > > > > UMD's are actually using the information in this header. In OA 
> > > > > we dropped the header and are providing information in the 
> > > > > header via different means (see below).
> > > > >
> > > > > Another option is to actually add a property for the header. So 
> > > > > headers are added only when user space requests headers.
> > > > >
> > > > > > + */
> > > > > > +struct drm_xe_eu_stall_data_header {
> > > > > > +	/** @subslice: subslice number from which the following data
> > > > > > +	 * has been captured.
> > > > > > +	 */
> > > > > > +	__u16 subslice;
> > > > >
> > > > > Do UMD's use this subslice information? We should check with L0 and Mesa about this.
> > > > >
> > > > > [MAC] L0 does not currently use this.
> > >
> > > No usage for sublice at the moment in Mesa
> > >
> > > > >
> > > > > Also about whether UMD's need or want the header itself. For OA, 
> > > > > UMD's were happy not having to parse the header.
> > > > >
> > > > > > +	/** @flags: flags */
> > > > > > +	__u16 flags;
> > > > > > +/* EU stall data dropped by the HW due to memory buffer being full */
> > > > > > +#define XE_EU_STALL_FLAG_OVERFLOW_DROP	(1 << 0)
> > > > >
> > > > > In OA such information is returned via 
> > > > > DRM_XE_OBSERVATION_IOCTL_STATUS. For EU stall, e.g. we could 
> > > > > return a bit mask of subslices which reporting drops. So similar 
> > > > > to OA, we could return -EIO when HW reports drops and userspace 
> > > > > optionally issues DRM_XE_OBSERVATION_IOCTL_STATUS to retrieve 
> > > > > which subslices are reporting drops.
> > > > >
> > > > > [MAC] having a return code to notify of reports drops would be 
> > > > > much preferable. This would allow the UMD detecting this 
> > > > > condition during the read phase without needing to process/parse each report.
> > 
> > Matias: could you please explain what L0 does with this dropped flag?
> > 
> > Harish: do we know what is the reason HW sets this dropped flag? Is it because userland is not reading fast enough so HW is forced to drop data?
> > 
> > >
> > > But what can UMD do when that is set?
> > 
> > Mesa can ignore this if they don't need it.
> > 
> > >
> > > I would rather have a warn once printed on dmesg, so the issues 
> > > don't go silent but it don't need to go to the uAPI.
> > 
> > dmesg warn is likely not an option because it will trigger bugs in our CI.
> > 
> > >
> > > > >
> > > > > > +	/** @record_size: size of each EU stall data record */
> > > > > > +	__u16 record_size;
> > > > >
> > > > > This is static information. Does it need to be in each packet header?
> > > > > E.g. it can be returned via DRM_XE_OBSERVATION_IOCTL_INFO after 
> > > > > a EU Stall stream has been opened.
> > > > >
> > > > > [MAC] since the size is constant, it seems an overhead including 
> > > > > the info in every report.
> > >
> > > drm_xe_eu_stall_data_xe2 should be of the same size as record_size so it can also be dropped.
> > >
> > > > >
> > > > > The INFO data struct could also include a capabilities field. So 
> > > > > if new features are added to EU stall in the future, they would 
> > > > > be advertized to user space using the capabilities field.
> > > > >
> > > > > > +	/** @num_records: number of records following the header */
> > > > > > +	__u16 num_records;
> > > > >
> > > > > This will not be needed if just return raw EU Stall data without 
> > > > > headers. Or even otherwise it is probably not needed, it is the 
> > > > > total size of returned data minus the size of the header.
> > > > > Provided we return all available data.
> > >
> > > Same as above, would not be needed if drm_xe_eu_stall_data_xe2 matches samples size.
> > >
> > > > >
> > > > > [MAC] the KMD will always return atomic units of reports, right? 
> > > > > Then this is not needed, having UMD the possibility to query 
> > > > > report size when opening the stream, the UMD can know how many reports are in each read.
> > > > >
> > > > > > +	/** @reserved: Reserved */
> > > > > > +	__u16 reserved[4];
> > > > >
> > > > > This can be handled via 'extensions'. And if headers change they 
> > > > > can be advertized in capabilities.
> > > > >
> > > > > > +};
> > > > > > +
> > > > > >  #if defined(__cplusplus)
> > > > > >  }
> > > > > >  #endif
> > > > > > --
> > > > > > 2.41.0
> > > > > >
> > > > >
> > > > > Thanks.
> > > > > --
> > > > > Ashutosh
> > >


More information about the Intel-xe mailing list