[PATCH v2 1/1] drm/xe/eustall: Add support for EU stall sampling
Harish Chegondi
harish.chegondi at intel.com
Fri Aug 30 06:20:35 UTC 2024
Here is the summary of the discussion regarding the uAPI
1. Eliminate the data header from the data copied by the
driver to the user space.
2. Subslice information in the header is NOT used by the user space since
the data is collected at the tile granularity.
3. The only flags bit(0) in the header currently used, is to indicate if
the HW has dropped any EU stall data due to insufficient space in the
kernel buffer. Instead of a flag in the header, the driver would return
an error during a read() if *any* subslice in the tile has dropped data.
Any EU stall data present in the kernel buffer would NOT be read.
The subsequent read() would return EU stall data for all subslices on
the tile and also clear the drop bit in the HW registers for all
subslices that dropped data.
4. User space doesn't seem to be interested to know which subslices have
dropped data. So, the driver would not provide any STATUS IOCTL to get
this info.
5. Record size in the header is a static info which can be queried
through an INFO IOCTL after a file descriptor is opened. Based on the
GPU, user space can determine this as well.
Thanks
Harish.
On Mon, Aug 26, 2024 at 10:31:04AM -0700, Cabral, Matias A wrote:
> > Matias: could you please explain what L0 does with this dropped flag?
>
> During the processing of the data, L0 returns a warning message. VTune ( I think) also warns the user that results were collected but will be inaccurate because the draining/reading of data was not done fast enough. By moving the warning to be returned at earlier/reading step, VTune may a) on the fly increase the reading frequency reducing the amount of data lost b) cancel the collection immediately, saving time to the user that may collect data in one node and process in a different one.
>
> Thanks,
> _MAC
>
> -----Original Message-----
> From: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> Sent: Monday, August 26, 2024 9:48 AM
> To: Souza, Jose <jose.souza at intel.com>
> Cc: Cabral, Matias A <matias.a.cabral at intel.com>; intel-xe at lists.freedesktop.org; Degrood, Felix J <felix.j.degrood at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Ranjan, Joshua Santhosh <joshua.santosh.ranjan at intel.com>; Chegondi, Harish <harish.chegondi at intel.com>; Kumar, Shubham <shubham.kumar at intel.com>; Ausmus, James <james.ausmus at intel.com>
> Subject: Re: [PATCH v2 1/1] drm/xe/eustall: Add support for EU stall sampling
>
> On Fri, 23 Aug 2024 14:22:19 -0700, Souza, Jose wrote:
> >
> > Hi
>
> Thanks Jose. One question for Matias/L0 below.
>
> > On Thu, 2024-08-22 at 15:53 -0700, Dixit, Ashutosh wrote:
> > > On Wed, 21 Aug 2024 12:35:51 -0700, Cabral, Matias A wrote:
> > >
> > > Hi Matias,
> > >
> > > Thanks for responding, the input is _very_ helpful.
> > >
> > > Mesa folks: would it be possible for you to provide similar input too?
> >
> > Felix's MR[1] is only using record_size and num_records, if the
> > drm_xe_eu_stall_data_xe2 was the same size and the sample we would not
> > need the header at all, inline replies below.
> >
> > [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142
> >
> > >
> > > Thanks.
> > > --
> > > Ashutosh
> > >
> > >
> > > >
> > > > Hi Ashutosh,
> > > >
> > > > Some inline questions below [MAC]
> > > >
> > > > Thanks,
> > > > _MAC
> > > >
> > > > -----Original Message-----
> > > > From: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> > > > Sent: Friday, August 16, 2024 3:38 PM
> > > > To: intel-xe at lists.freedesktop.org
> > > > Cc: Chegondi, Harish <harish.chegondi at intel.com>; Nerlige Ramappa,
> > > > Umesh <umesh.nerlige.ramappa at intel.com>; Degrood, Felix J
> > > > <felix.j.degrood at intel.com>; Souza, Jose <jose.souza at intel.com>;
> > > > Cabral, Matias A <matias.a.cabral at intel.com>
> > > > Subject: Re: [PATCH v2 1/1] drm/xe/eustall: Add support for EU
> > > > stall sampling
> > > >
> > > > On Sun, 07 Jul 2024 15:41:41 -0700, Ashutosh Dixit wrote:
> > > >
> > > > Hi Harish,
> > > >
> > > > Some comments below on just the uapi first, towards finalizing the
> > > > uapi with the UMD's who consume this data. And also comparing the
> > > > uapi with what we did in OA.
> > > >
> > > > >
> > > > > diff --git a/include/uapi/drm/xe_drm.h
> > > > > b/include/uapi/drm/xe_drm.h index 19619d4952a8..343de700d10d
> > > > > 100644
> > > >
> > > > /snip/
> > > >
> > > > > +/**
> > > > > + * struct drm_xe_eu_stall_data_header - EU stall data header.
> > > > > + * Header with additional information that the driver adds
> > > > > + * before EU stall data of each subslice during read().
> > > >
> > > > One question to resolve is if we really need this header and if
> > > > UMD's are actually using the information in this header. In OA we
> > > > dropped the header and are providing information in the header via
> > > > different means (see below).
> > > >
> > > > Another option is to actually add a property for the header. So
> > > > headers are added only when user space requests headers.
> > > >
> > > > > + */
> > > > > +struct drm_xe_eu_stall_data_header {
> > > > > + /** @subslice: subslice number from which the following data
> > > > > + * has been captured.
> > > > > + */
> > > > > + __u16 subslice;
> > > >
> > > > Do UMD's use this subslice information? We should check with L0 and Mesa about this.
> > > >
> > > > [MAC] L0 does not currently use this.
> >
> > No usage for sublice at the moment in Mesa
> >
> > > >
> > > > Also about whether UMD's need or want the header itself. For OA,
> > > > UMD's were happy not having to parse the header.
> > > >
> > > > > + /** @flags: flags */
> > > > > + __u16 flags;
> > > > > +/* EU stall data dropped by the HW due to memory buffer being full */
> > > > > +#define XE_EU_STALL_FLAG_OVERFLOW_DROP (1 << 0)
> > > >
> > > > In OA such information is returned via
> > > > DRM_XE_OBSERVATION_IOCTL_STATUS. For EU stall, e.g. we could
> > > > return a bit mask of subslices which reporting drops. So similar
> > > > to OA, we could return -EIO when HW reports drops and userspace
> > > > optionally issues DRM_XE_OBSERVATION_IOCTL_STATUS to retrieve
> > > > which subslices are reporting drops.
> > > >
> > > > [MAC] having a return code to notify of reports drops would be
> > > > much preferable. This would allow the UMD detecting this condition
> > > > during the read phase without needing to process/parse each report.
>
> Matias: could you please explain what L0 does with this dropped flag?
>
> Harish: do we know what is the reason HW sets this dropped flag? Is it because userland is not reading fast enough so HW is forced to drop data?
>
> >
> > But what can UMD do when that is set?
>
> Mesa can ignore this if they don't need it.
>
> >
> > I would rather have a warn once printed on dmesg, so the issues don't
> > go silent but it don't need to go to the uAPI.
>
> dmesg warn is likely not an option because it will trigger bugs in our CI.
>
> >
> > > >
> > > > > + /** @record_size: size of each EU stall data record */
> > > > > + __u16 record_size;
> > > >
> > > > This is static information. Does it need to be in each packet header?
> > > > E.g. it can be returned via DRM_XE_OBSERVATION_IOCTL_INFO after a
> > > > EU Stall stream has been opened.
> > > >
> > > > [MAC] since the size is constant, it seems an overhead including
> > > > the info in every report.
> >
> > drm_xe_eu_stall_data_xe2 should be of the same size as record_size so it can also be dropped.
> >
> > > >
> > > > The INFO data struct could also include a capabilities field. So
> > > > if new features are added to EU stall in the future, they would be
> > > > advertized to user space using the capabilities field.
> > > >
> > > > > + /** @num_records: number of records following the header */
> > > > > + __u16 num_records;
> > > >
> > > > This will not be needed if just return raw EU Stall data without
> > > > headers. Or even otherwise it is probably not needed, it is the
> > > > total size of returned data minus the size of the header. Provided
> > > > we return all available data.
> >
> > Same as above, would not be needed if drm_xe_eu_stall_data_xe2 matches samples size.
> >
> > > >
> > > > [MAC] the KMD will always return atomic units of reports, right?
> > > > Then this is not needed, having UMD the possibility to query
> > > > report size when opening the stream, the UMD can know how many reports are in each read.
> > > >
> > > > > + /** @reserved: Reserved */
> > > > > + __u16 reserved[4];
> > > >
> > > > This can be handled via 'extensions'. And if headers change they
> > > > can be advertized in capabilities.
> > > >
> > > > > +};
> > > > > +
> > > > > #if defined(__cplusplus)
> > > > > }
> > > > > #endif
> > > > > --
> > > > > 2.41.0
> > > > >
> > > >
> > > > Thanks.
> > > > --
> > > > Ashutosh
> >
More information about the Intel-xe
mailing list