[Intel-xe] [PATCH V3 2/3] drm/xe/ats-m: Expose uid for ATS-M via sysfs

Thu Nov 30 23:10:02 UTC 2023

On Thu, Nov 30, 2023 at 04:32:38PM -0500, Rodrigo Vivi wrote:
>On Wed, Nov 29, 2023 at 12:00:39AM -0500, Upadhyay, Tejas wrote:
>>
>>
>> > -----Original Message-----
>> > From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>
>> > Sent: Tuesday, November 28, 2023 10:31 PM
>> > To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
>> > xe at lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi at intel.com>
>> > Subject: Re: [Intel-xe] [PATCH V3 2/3] drm/xe/ats-m: Expose uid for ATS-M via
>> > sysfs
>> >
>> >
>> >
>> > On 28.11.2023 16:13, Tejas Upadhyay wrote:
>> > > On ATS-M CSC firmware calculated unique device id(uid) can be read
>> > > from defined registers. Only ATS-M has this way of reading uid. Also
>> > > ATS-M customer require this solution to locate and name their devices
>> > > effectively.
>> > >
>> > > Consumer of csc uid is level0 sysman and end users using sysman.
>> >
>> > is this generic UID or UID only specific to CSC firmware ?
>> > if former then maybe just name it as "uid" or "uuid" ?
>> >
>> > >
>> > > This patch exports uid to the userspace via sysfs.
>> >
>> > as this defines uabi, shouldn't this series go over dri-devel ?
>> >
>> > + @Rodrigo
>
>The biggest problem that I see here is actually the justification.
>It doesn't matter which wrapper are using our sysfs or not.
>
>A better commit message would be something like:
>
>In case of multiple similar dgfx cards plugged to the same node, the
>user cannot rely on DRM card number nor on PCI address to identify
>which exact part it is operating at since these numbers can change
>upon order enumeration. In the case of FW flashing or other cases

kind of sounds like what we have for network with predictable
network names.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-consistent_network_device_naming

This would provide the firmware naming, but I wonder if it can't
be solved today entirely in userspace.

>where the admin might know specifically which device it is accessing,
>it needs to have an unique identifier number. To solve this case,
>CSC FW provides an unique number that can be query. Let's expose this
>number in the sysfs to make the device clearly identifiable.

if this is the justification (like you said, it should be in the commit
message so we know what's the use to give a proper review), then it also
answers the question "why is it ok to wake the device on sysfs access
rather than doing on device probe?" asked elsewhere: because clients
will almost never access it. It's not like a userspace client is reading
this value 10 times per second

Lucas De Marchi

>
>> >
>> > > For example, uid can be read by:
>> > > cat /sys/class/drm/cardX/device/csc_uid
>> >
>> > as this is more Xe driver extension then maybe
>> >
>> > /sys/bus/pci/drivers/xe/<BDF>/uuid
>> >
>> > >
>> > > V2(Michal):
>> > >   - Make separate patch for csc uid capability
>> > >   - Use %#llx for 0x
>> > >   - Dump error value when sysfs creation fails
>> > >   - Have consistency in csc uid name
>> > >
>> > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> > > ---
>> > >  drivers/gpu/drm/xe/regs/xe_regs.h    |  3 +++
>> > >  drivers/gpu/drm/xe/xe_device.c       | 11 +++++++++++
>> > >  drivers/gpu/drm/xe/xe_device_sysfs.c | 21 +++++++++++++++++++++
>> > > drivers/gpu/drm/xe/xe_device_types.h |  2 ++
>> > >  4 files changed, 37 insertions(+)
>> > >
>> > > diff --git a/drivers/gpu/drm/xe/regs/xe_regs.h
>> > > b/drivers/gpu/drm/xe/regs/xe_regs.h
>> > > index ec9372aa739f..3599c7b38b61 100644
>> > > --- a/drivers/gpu/drm/xe/regs/xe_regs.h
>> > > +++ b/drivers/gpu/drm/xe/regs/xe_regs.h
>> > > @@ -71,6 +71,9 @@
>> > >  #define XEHP_CLOCK_GATE_DIS			XE_REG(0x101014)
>> > >  #define   SGSI_SIDECLK_DIS			REG_BIT(17)
>> > >
>> > > +#define DEVUID_LWORD				XE_REG(0x102008)
>> > > +#define DEVUID_HWORD				XE_REG(0x10200C)
>> >
>> > Bspec uses different names .. and LWORD/HWORD look strange but since it's
>> > for SW use then maybe at least (assuming ATSM is XEHP):
>> >
>> > #define XEHP_DEVUID_LDW				XE_REG(0x102008)
>> > #define XEHP_DEVUID_UDW				XE_REG(0x10200C)
>> >
>> > > +
>> > >  #define GGC					XE_REG(0x108040)
>> > >  #define   GMS_MASK				REG_GENMASK(15, 8)
>> > >  #define   GGMS_MASK				REG_GENMASK(7, 6)
>> > > diff --git a/drivers/gpu/drm/xe/xe_device.c
>> > > b/drivers/gpu/drm/xe/xe_device.c index d60379d844d2..b54635d512d6
>> > > 100644
>> > > --- a/drivers/gpu/drm/xe/xe_device.c
>> > > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > > @@ -355,6 +355,15 @@ static void xe_device_sanitize(struct drm_device
>> > *drm, void *arg)
>> > >  		xe_gt_sanitize(gt);
>> > >  }
>> > >
>> > > +static void xe_read_csc_uid(struct xe_device *xe) {
>> > > +	if (xe->info.has_csc_uid) {
>> > > +		struct xe_gt *mmio = xe_root_mmio_gt(xe);
>> > > +
>> > > +		xe->info.csc_uid  = xe_mmio_read64_2x32(mmio,
>> > DEVUID_LWORD);
>> > > +	}
>> > > +}
>> > > +
>> > >  int xe_device_probe(struct xe_device *xe)  {
>> > >  	struct xe_tile *tile;
>> > > @@ -379,6 +388,8 @@ int xe_device_probe(struct xe_device *xe)
>> > >  	if (err)
>> > >  		return err;
>> > >
>> > > +	xe_read_csc_uid(xe);
>> > > +
>> > >  	err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
>> > >  	if (err)
>> > >  		return err;
>> > > diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c
>> > > b/drivers/gpu/drm/xe/xe_device_sysfs.c
>> > > index 99113a5a2b84..4fbf0275d3bf 100644
>> > > --- a/drivers/gpu/drm/xe/xe_device_sysfs.c
>> > > +++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
>> > > @@ -65,11 +65,24 @@ vram_d3cold_threshold_store(struct device *dev,
>> > > struct device_attribute *attr,
>> > >
>> > >  static DEVICE_ATTR_RW(vram_d3cold_threshold);
>> > >
>> > > +static ssize_t csc_uid_show(struct device *dev,
>> > > +			    struct device_attribute *attr,
>> > > +			    char *buf)
>> > > +{
>> > > +	struct pci_dev *pdev = to_pci_dev(dev);
>> > > +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> > > +
>> > > +	return sysfs_emit(buf, "%#llx\n", xe->info.csc_uid);
>> >
>> > if there are no plans to use this value in driver then maybe just read it here as
>> > it was already suggested by Lucas
>>
>> Why to wake GT on every sysfs read and read those registers? Does not look correct in my POV.
>
>Why not to wake? it shouldn't be a query that user is doing every time right?
>
>>
>> Thanks,
>> Tejas
>> >
>> > > +}
>> > > +
>> > > +static DEVICE_ATTR_RO(csc_uid);
>> > > +
>> > >  static void xe_device_sysfs_fini(struct drm_device *drm, void *arg)
>> > > {
>> > >  	struct xe_device *xe = arg;
>> > >
>> > >  	sysfs_remove_file(&xe->drm.dev->kobj,
>> > > &dev_attr_vram_d3cold_threshold.attr);
>> > > +	sysfs_remove_file(&xe->drm.dev->kobj, &dev_attr_csc_uid.attr);
>> >
>> > it's sad that we don't have proper groups defines and need to add each
>> > attribute separately (but that's not your fault)
>> >
>> > >  }
>> > >
>> > >  void xe_device_sysfs_init(struct xe_device *xe) @@ -83,6 +96,14 @@
>> > > void xe_device_sysfs_init(struct xe_device *xe)
>> > >  		return;
>> > >  	}
>> > >
>> > > +	if (xe->info.has_csc_uid) {
>> > > +		ret = sysfs_create_file(&dev->kobj, &dev_attr_csc_uid.attr);
>> > > +		if (ret) {
>> > > +			drm_warn(&xe->drm, "UID sysfs setup failed
>> > err=%pe\n", ERR_PTR(ret));
>> > > +			return;
>> >
>> > if you exit here, you will leak dev_attr_vram_d3cold_threshold
>> >
>> > > +		}
>> > > +	}
>> > > +
>> > >  	ret = drmm_add_action_or_reset(&xe->drm, xe_device_sysfs_fini,
>> > xe);
>> > >  	if (ret)
>> > >  		drm_warn(&xe->drm, "Failed to add sysfs fini drm action\n");
>> > diff
>> > > --git a/drivers/gpu/drm/xe/xe_device_types.h
>> > > b/drivers/gpu/drm/xe/xe_device_types.h
>> > > index 268a40639ec5..e88b64cb0eca 100644
>> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
>> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> > > @@ -239,6 +239,8 @@ struct xe_device {
>> > >  		u8 vm_max_level;
>> > >  		/** @va_bits: Maximum bits of a virtual address */
>> > >  		u8 va_bits;
>> > > +		/** @csc_uid: device uid calculated by CSC FW, used for
>> > generating uuid */
>> > > +		u64 csc_uid;
>> > >
>> > >  		/** @is_dgfx: is discrete device */
>> > >  		u8 is_dgfx:1;