[PATCH v3 1/1] drm/xe/eustall: Add support for EU stall sampling

Cabral, Matias A matias.a.cabral at intel.com
Wed Oct 2 17:14:00 UTC 2024


Hi Harish, 

Any updates on the upstream PR? Can we expect xe-internal to be updated with the upstream uAPI during Q4 ? 

Thanks, 
_MAC

-----Original Message-----
From: Chegondi, Harish <harish.chegondi at intel.com> 
Sent: Friday, September 20, 2024 11:08 AM
To: Ranjan, Joshua Santhosh <joshua.santosh.ranjan at intel.com>
Cc: Dixit, Ashutosh <ashutosh.dixit at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; intel-xe at lists.freedesktop.org; Ausmus, James <james.ausmus at intel.com>; Degrood, Felix J <felix.j.degrood at intel.com>; Souza, Jose <jose.souza at intel.com>; Cabral, Matias A <matias.a.cabral at intel.com>; Kumar, Shubham <shubham.kumar at intel.com>
Subject: Re: [PATCH v3 1/1] drm/xe/eustall: Add support for EU stall sampling

On Fri, Sep 13, 2024 at 12:35:28AM -0700, Ranjan, Joshua Santhosh wrote:
> Hi
> 
> Please find responses ([Santhosh]) inline.
> 
> -----Original Message-----
> From: Chegondi, Harish <harish.chegondi at intel.com>
> Sent: Friday, September 13, 2024 7:24 AM
> To: Dixit, Ashutosh <ashutosh.dixit at intel.com>
> Cc: Roper, Matthew D <matthew.d.roper at intel.com>; 
> intel-xe at lists.freedesktop.org; Ausmus, James 
> <james.ausmus at intel.com>; Degrood, Felix J 
> <felix.j.degrood at intel.com>; Souza, Jose <jose.souza at intel.com>; 
> Cabral, Matias A <matias.a.cabral at intel.com>; Ranjan, Joshua Santhosh 
> <joshua.santosh.ranjan at intel.com>
> Subject: Re: [PATCH v3 1/1] drm/xe/eustall: Add support for EU stall 
> sampling
> 
> On Wed, Sep 11, 2024 at 04:21:12PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 09 Sep 2024 17:09:14 -0700, Matt Roper wrote:
> > >
> > 
> > Hi Matt,
> > 
> > Some comments about the uapi below.
> > 
> > > > @@ -1694,6 +1696,138 @@ struct drm_xe_oa_stream_info {
> > > >	__u64 reserved[3];
> > > >  };
> > > >
> > > > +/**
> > > > + * enum drm_xe_eu_stall_property_id - EU stall data stream property ids.
> > > > + *
> > > > + * These properties are passed to the driver as a chain of
> > > > + * @drm_xe_ext_set_property structures with @property set to 
> > > > +these
> > > > + * properties' enums and @value set to the corresponding values 
> > > > +of these
> > > > + * properties. @drm_xe_user_extension base.name should be set 
> > > > +to
> > > > + * @DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY.
> > > > + */
> > > > +enum drm_xe_eu_stall_property_id {
> > > > +#define DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY		0
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_BUF_SZ: Per DSS Memory Buffer Size.
> > > > +	 * Valid values are 128 KB, 256 KB, and 512 KB.
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_BUF_SZ = 1,
> > >
> > > Does userspace really care about this being configurable?  Even if 
> > > we have a platform with XE_MAX_DSS_FUSE_BITS total XeCores, the 
> > > difference between the largest and smallest sizes here only saves 48MB of memory.
> > > The hardware makes this configurable, but is there actually an ask 
> > > to expose this through the uapi?  If not, I'd say we should just 
> > > always pick 512KB internally and keep things simple.
> > 
> > Agreed.
> > 
> > >
> > > > +
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_SAMPLE_RATE: Sampling rate
> > > > +	 * in multiples of 251 cycles. Valid values are 1 to 7.
> > > > +	 * If the value is 1, sampling interval is 251 cycles.
> > > > +	 * If the value is 7, sampling interval is 7 x 251 cycles.
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_SAMPLE_RATE,
> > > > +
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_POLL_PERIOD: EU stall data
> > > > +	 * poll period in nanoseconds. should be at least 100000 ns.
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_POLL_PERIOD,
> > > > +
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT: Minimum number of
> > > > +	 * EU stall data rows to be present in the kernel buffer for
> > > > +	 * poll() to set POLLIN (data present).
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT,
> > > > +
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_GT_ID: GT ID of the GT on which
> > > > +	 * EU stall data will be captured.
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_GT_ID,
> > >
> > > I mentioned above, but it feels like this should be an intrinsic 
> > > part of the API, not something coming in through an optional extension...
> > 
> > Not sure what "intrinsic part of the API" should be. EU Stall, OA etc. 
> > are now types of "observation streams" and there is no other 
> > opportunity in the observation stream interface for this kind of 
> > information. Also, gt_id is EU stall specific (e.g. OA uses 
> > oa_unit_id not gt_id) so it should come in as an EU Stall property.
> > 
> > Though question for Harish: do we really need this? What if we 
> > return data from all tiles/gt's in a single read() call? Since we 
> > have already determined subslice number is not interesting to UMD's 
> > for EU Stall data, maybe gt_id is similarly not interesting?
> User space folks said that they are interested in the EU stall data at the GT level. So if we decide to return stall data from all the GTs, we may have to append the GT ID to the data. I will seek feedback from the user space folks regarding this.
> 
> [Santhosh]  Yes. L0 allows Metric collection at a tile level granularity. 
> 
> Thanks
> Harish.
> > 
> > >
> > > > +
> > > > +	/**
> > > > +	 * @DRM_XE_EU_STALL_PROP_OPEN_DISABLED: A value of 1 will open
> > > > +	 * the EU stall data stream without enabling EU stall sampling.
> > > > +	 */
> > > > +	DRM_XE_EU_STALL_PROP_OPEN_DISABLED,
> > >
> > > Is there a reason not to make this one the default behavior?  Is 
> > > there really a benefit to auto-enabling on open that makes it 
> > > worth the extra API complexity to make this configurable?
> > 
> > I agree, this would be a good idea. But unfortunately this got in 
> > for OA (if I were doing OA now I would have taken Matt's 
> > suggestion). So we either
> > (a) keep it the same as OA for uniformity, or (b) document that EU 
> > stall streams have to be explicity enabled before read()'ing data.
> > 
> > >
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_stream_info - EU stall stream info 
> > > > +returned from
> > > > + * @DRM_XE_OBSERVATION_IOCTL_INFO observation stream fd ioctl  
> > > > +*/ struct drm_xe_eu_stall_stream_info {
> > > > +	/** @extensions: Pointer to the first extension struct, if any */
> > > > +	__u64 extensions;
> > > > +
> > > > +	/** @record_size: size of each EU stall data record */
> > > > +	__u64 record_size;
> > > > +
> > > > +	/** @reserved: reserved for future use */
> > > > +	__u64 reserved[3];
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_data_pvc - EU stall data format for 
> > > > +PVC
> > >
> > > I thought there was an ask from one of the userspace teams that we 
> > > make the layout discoverable?  I.e., a runtime-queryable format 
> > > description that lists the counter type, size, and offset in the 
> > > record of each counter that has usable data in the records?  Did 
> > > we change directions on that?
> 
> [Santhosh] Yes. From L0, there was a request made to have a runtime-queryable format.
Yes, there was a request for a runtime-queryable format. But as per the request, the driver would parse the data and would send only the non-zero data to the user space. But as per the most recent discussion, it was agreed that the driver would send all data it receives from the HW without parsing/re-formatting the data as one bigger memcpy() would be more performant than several smaller memcpy()s.

If the driver is sending all the HW data as-is, would it still help user space to get runtime-queryable format from the driver ?

Thanks
Harish.

> 
> > 
> > My suggestion is to eliminate these struct's from the KMD uapi header. 
> > See my earlier response to Harish's patch.
> > 
> > Thanks.
> > --
> > Ashutosh
> > 
> > 
> > >
> > > > + *
> > > > + * Bits		Field
> > > > + * 0  to 28	IP (addr)
> > > > + * 29 to 36	active count
> > > > + * 37 to 44	other count
> > > > + * 45 to 52	control count
> > > > + * 53 to 60	pipestall count
> > > > + * 61 to 68	send count
> > > > + * 69 to 76	dist_acc count
> > > > + * 77 to 84	sbid count
> > > > + * 85 to 92	sync count
> > > > + * 93 to 100	inst_fetch count
> > > > + */
> > > > +struct drm_xe_eu_stall_data_pvc {
> > > > +	__u64 ip_addr:29;
> > > > +	__u64 active_count:8;
> > > > +	__u64 other_count:8;
> > > > +	__u64 control_count:8;
> > > > +	__u64 pipestall_count:8;
> > > > +	__u64 send_count:8;
> > > > +	__u64 dist_acc_count:8;
> > > > +	__u64 sbid_count:8;
> > > > +	__u64 sync_count:8;
> > > > +	__u64 inst_fetch_count:8;
> > > > +	__u64 unused_bits:27;
> > > > +	__u64 unused[6];
> > > > +} __attribute__((packed));
> > > > +
> > > > +/**
> > > > + * struct drm_xe_eu_stall_data_xe2 - EU stall data format for 
> > > > +LNL, BMG
> > > > + *
> > > > + * Bits		Field
> > > > + * 0  to 28	IP (addr)
> > > > + * 29 to 36	Tdr count
> > > > + * 37 to 44	other count
> > > > + * 45 to 52	control count
> > > > + * 53 to 60	pipestall count
> > > > + * 61 to 68	send count
> > > > + * 69 to 76	dist_acc count
> > > > + * 77 to 84	sbid count
> > > > + * 85 to 92	sync count
> > > > + * 93 to 100	inst_fetch count
> > > > + * 101 to 108	Active count
> > > > + * 109 to 111	Exid
> > > > + * 112		EndFlag (is always 1)
> > > > + */
> > > > +struct drm_xe_eu_stall_data_xe2 {
> > > > +	__u64 ip_addr:29;
> > > > +	__u64 tdr_count:8;
> > > > +	__u64 other_count:8;
> > > > +	__u64 control_count:8;
> > > > +	__u64 pipestall_count:8;
> > > > +	__u64 send_count:8;
> > > > +	__u64 dist_acc_count:8;
> > > > +	__u64 sbid_count:8;
> > > > +	__u64 sync_count:8;
> > > > +	__u64 inst_fetch_count:8;
> > > > +	__u64 active_count:8;
> > > > +	__u64 ex_id:3;
> > > > +	__u64 end_flag:1;
> > > > +	__u64 unused_bits:15;
> > > > +	__u64 unused[6];
> > > > +} __attribute__((packed));
> > > > +
> > > >  #if defined(__cplusplus)
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.45.1
> > > >
> > >
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation


More information about the Intel-xe mailing list