[PATCH v3 1/1] drm/xe/eustall: Add support for EU stall sampling
Harish Chegondi
harish.chegondi at intel.com
Fri Sep 20 18:07:52 UTC 2024
On Fri, Sep 13, 2024 at 12:35:28AM -0700, Ranjan, Joshua Santhosh wrote:
> Hi
>
> Please find responses ([Santhosh]) inline.
>
> -----Original Message-----
> From: Chegondi, Harish <harish.chegondi at intel.com>
> Sent: Friday, September 13, 2024 7:24 AM
> To: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> Cc: Roper, Matthew D <matthew.d.roper at intel.com>; intel-xe at lists.freedesktop.org; Ausmus, James <james.ausmus at intel.com>; Degrood, Felix J <felix.j.degrood at intel.com>; Souza, Jose <jose.souza at intel.com>; Cabral, Matias A <matias.a.cabral at intel.com>; Ranjan, Joshua Santhosh <joshua.santosh.ranjan at intel.com>
> Subject: Re: [PATCH v3 1/1] drm/xe/eustall: Add support for EU stall sampling
>
> On Wed, Sep 11, 2024 at 04:21:12PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 09 Sep 2024 17:09:14 -0700, Matt Roper wrote:
> > >
> >
> > Hi Matt,
> >
> > Some comments about the uapi below.
> >
> > > > @@ -1694,6 +1696,138 @@ struct drm_xe_oa_stream_info {
> > > > __u64 reserved[3];
> > > > };
> > > >
> > > > +/**
> > > > + * enum drm_xe_eu_stall_property_id - EU stall data stream property ids.
> > > > + *
> > > > + * These properties are passed to the driver as a chain of
> > > > + * @drm_xe_ext_set_property structures with @property set to
> > > > +these
> > > > + * properties' enums and @value set to the corresponding values
> > > > +of these
> > > > + * properties. @drm_xe_user_extension base.name should be set to
> > > > + * @DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY.
> > > > + */
> > > > +enum drm_xe_eu_stall_property_id {
> > > > +#define DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY 0
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_BUF_SZ: Per DSS Memory Buffer Size.
> > > > + * Valid values are 128 KB, 256 KB, and 512 KB.
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_BUF_SZ = 1,
> > >
> > > Does userspace really care about this being configurable? Even if
> > > we have a platform with XE_MAX_DSS_FUSE_BITS total XeCores, the
> > > difference between the largest and smallest sizes here only saves 48MB of memory.
> > > The hardware makes this configurable, but is there actually an ask
> > > to expose this through the uapi? If not, I'd say we should just
> > > always pick 512KB internally and keep things simple.
> >
> > Agreed.
> >
> > >
> > > > +
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_SAMPLE_RATE: Sampling rate
> > > > + * in multiples of 251 cycles. Valid values are 1 to 7.
> > > > + * If the value is 1, sampling interval is 251 cycles.
> > > > + * If the value is 7, sampling interval is 7 x 251 cycles.
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_SAMPLE_RATE,
> > > > +
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_POLL_PERIOD: EU stall data
> > > > + * poll period in nanoseconds. should be at least 100000 ns.
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_POLL_PERIOD,
> > > > +
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT: Minimum number of
> > > > + * EU stall data rows to be present in the kernel buffer for
> > > > + * poll() to set POLLIN (data present).
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT,
> > > > +
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_GT_ID: GT ID of the GT on which
> > > > + * EU stall data will be captured.
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_GT_ID,
> > >
> > > I mentioned above, but it feels like this should be an intrinsic
> > > part of the API, not something coming in through an optional extension...
> >
> > Not sure what "intrinsic part of the API" should be. EU Stall, OA etc.
> > are now types of "observation streams" and there is no other
> > opportunity in the observation stream interface for this kind of
> > information. Also, gt_id is EU stall specific (e.g. OA uses oa_unit_id
> > not gt_id) so it should come in as an EU Stall property.
> >
> > Though question for Harish: do we really need this? What if we return
> > data from all tiles/gt's in a single read() call? Since we have
> > already determined subslice number is not interesting to UMD's for EU
> > Stall data, maybe gt_id is similarly not interesting?
> User space folks said that they are interested in the EU stall data at the GT level. So if we decide to return stall data from all the GTs, we may have to append the GT ID to the data. I will seek feedback from the user space folks regarding this.
>
> [Santhosh] Yes. L0 allows Metric collection at a tile level granularity.
>
> Thanks
> Harish.
> >
> > >
> > > > +
> > > > + /**
> > > > + * @DRM_XE_EU_STALL_PROP_OPEN_DISABLED: A value of 1 will open
> > > > + * the EU stall data stream without enabling EU stall sampling.
> > > > + */
> > > > + DRM_XE_EU_STALL_PROP_OPEN_DISABLED,
> > >
> > > Is there a reason not to make this one the default behavior? Is
> > > there really a benefit to auto-enabling on open that makes it worth
> > > the extra API complexity to make this configurable?
> >
> > I agree, this would be a good idea. But unfortunately this got in for
> > OA (if I were doing OA now I would have taken Matt's suggestion). So
> > we either
> > (a) keep it the same as OA for uniformity, or (b) document that EU
> > stall streams have to be explicity enabled before read()'ing data.
> >
> > >
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_stream_info - EU stall stream info
> > > > +returned from
> > > > + * @DRM_XE_OBSERVATION_IOCTL_INFO observation stream fd ioctl */
> > > > +struct drm_xe_eu_stall_stream_info {
> > > > + /** @extensions: Pointer to the first extension struct, if any */
> > > > + __u64 extensions;
> > > > +
> > > > + /** @record_size: size of each EU stall data record */
> > > > + __u64 record_size;
> > > > +
> > > > + /** @reserved: reserved for future use */
> > > > + __u64 reserved[3];
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_data_pvc - EU stall data format for PVC
> > >
> > > I thought there was an ask from one of the userspace teams that we
> > > make the layout discoverable? I.e., a runtime-queryable format
> > > description that lists the counter type, size, and offset in the
> > > record of each counter that has usable data in the records? Did we
> > > change directions on that?
>
> [Santhosh] Yes. From L0, there was a request made to have a runtime-queryable format.
Yes, there was a request for a runtime-queryable format. But as per the
request, the driver would parse the data and would send only the
non-zero data to the user space. But as per the most recent discussion,
it was agreed that the driver would send all data it receives from the
HW without parsing/re-formatting the data as one bigger memcpy() would
be more performant than several smaller memcpy()s.
If the driver is sending all the HW data as-is, would it still help user
space to get runtime-queryable format from the driver ?
Thanks
Harish.
>
> >
> > My suggestion is to eliminate these struct's from the KMD uapi header.
> > See my earlier response to Harish's patch.
> >
> > Thanks.
> > --
> > Ashutosh
> >
> >
> > >
> > > > + *
> > > > + * Bits Field
> > > > + * 0 to 28 IP (addr)
> > > > + * 29 to 36 active count
> > > > + * 37 to 44 other count
> > > > + * 45 to 52 control count
> > > > + * 53 to 60 pipestall count
> > > > + * 61 to 68 send count
> > > > + * 69 to 76 dist_acc count
> > > > + * 77 to 84 sbid count
> > > > + * 85 to 92 sync count
> > > > + * 93 to 100 inst_fetch count
> > > > + */
> > > > +struct drm_xe_eu_stall_data_pvc {
> > > > + __u64 ip_addr:29;
> > > > + __u64 active_count:8;
> > > > + __u64 other_count:8;
> > > > + __u64 control_count:8;
> > > > + __u64 pipestall_count:8;
> > > > + __u64 send_count:8;
> > > > + __u64 dist_acc_count:8;
> > > > + __u64 sbid_count:8;
> > > > + __u64 sync_count:8;
> > > > + __u64 inst_fetch_count:8;
> > > > + __u64 unused_bits:27;
> > > > + __u64 unused[6];
> > > > +} __attribute__((packed));
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_data_xe2 - EU stall data format for
> > > > +LNL, BMG
> > > > + *
> > > > + * Bits Field
> > > > + * 0 to 28 IP (addr)
> > > > + * 29 to 36 Tdr count
> > > > + * 37 to 44 other count
> > > > + * 45 to 52 control count
> > > > + * 53 to 60 pipestall count
> > > > + * 61 to 68 send count
> > > > + * 69 to 76 dist_acc count
> > > > + * 77 to 84 sbid count
> > > > + * 85 to 92 sync count
> > > > + * 93 to 100 inst_fetch count
> > > > + * 101 to 108 Active count
> > > > + * 109 to 111 Exid
> > > > + * 112 EndFlag (is always 1)
> > > > + */
> > > > +struct drm_xe_eu_stall_data_xe2 {
> > > > + __u64 ip_addr:29;
> > > > + __u64 tdr_count:8;
> > > > + __u64 other_count:8;
> > > > + __u64 control_count:8;
> > > > + __u64 pipestall_count:8;
> > > > + __u64 send_count:8;
> > > > + __u64 dist_acc_count:8;
> > > > + __u64 sbid_count:8;
> > > > + __u64 sync_count:8;
> > > > + __u64 inst_fetch_count:8;
> > > > + __u64 active_count:8;
> > > > + __u64 ex_id:3;
> > > > + __u64 end_flag:1;
> > > > + __u64 unused_bits:15;
> > > > + __u64 unused[6];
> > > > +} __attribute__((packed));
> > > > +
> > > > #if defined(__cplusplus)
> > > > }
> > > > #endif
> > > > --
> > > > 2.45.1
> > > >
> > >
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation
More information about the Intel-xe
mailing list