[PATCH v3 2/3] drm/xe: Expose PCIe Gen4 downspeed attributes

Raag Jadav raag.jadav at intel.com
Wed Apr 23 15:01:42 UTC 2025


On Wed, Apr 23, 2025 at 08:35:42AM -0500, Lucas De Marchi wrote:
> On Wed, Apr 23, 2025 at 02:02:21PM +0300, Raag Jadav wrote:
> > On Wed, Apr 23, 2025 at 03:29:41PM +0530, Riana Tauro wrote:
> > > On 4/23/2025 2:18 PM, Raag Jadav wrote:
> > > > On Wed, Apr 23, 2025 at 10:55:30AM +0530, Riana Tauro wrote:
> > > > > Hi Raag
> > > > >
> > > > > On 4/17/2025 4:42 PM, Raag Jadav wrote:
> > > > > > Expose sysfs attributes for PCIe Gen4 downspeed capability and status.
> > > > > >
> > > > > > v2: Move from debugfs to sysfs (Lucas, Rodrigo, Badal)
> > > > > >       Rework macros and their naming (Rodrigo)
> > > > > > v3: Use sysfs_create_files() (Riana)
> > > > > >       Fix checkpatch warning (Riana)
> > > > > >
> > > > > > Signed-off-by: Raag Jadav <raag.jadav at intel.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/xe/xe_device.c       |   5 ++
> > > > > >    drivers/gpu/drm/xe/xe_device_sysfs.c | 101 +++++++++++++++++++++++++++
> > > > > >    drivers/gpu/drm/xe/xe_device_sysfs.h |   1 +
> > > > > >    drivers/gpu/drm/xe/xe_pcode_api.h    |   5 ++
> > > > > >    4 files changed, 112 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > > > index 6c9d3009aa03..79b7b0ecfbae 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > > @@ -26,6 +26,7 @@
> > > > > >    #include "xe_bo_evict.h"
> > > > > >    #include "xe_debugfs.h"
> > > > > >    #include "xe_devcoredump.h"
> > > > > > +#include "xe_device_sysfs.h"
> > > > > >    #include "xe_dma_buf.h"
> > > > > >    #include "xe_drm_client.h"
> > > > > >    #include "xe_drv.h"
> > > > > > @@ -916,6 +917,10 @@ int xe_device_probe(struct xe_device *xe)
> > > > > >    	if (err)
> > > > > >    		goto err_unregister_display;
> > > > > > +	err = xe_device_sysfs_init(xe);
> > > > > > +	if (err)
> > > > > > +		goto err_unregister_display;
> > > > > > +
> > > > > >    	xe_debugfs_register(xe);
> > > > > >    	err = xe_hwmon_register(xe);
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
> > > > > > index 2d25e5b5d4bf..923612a0a2e0 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_device_sysfs.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
> > > > > > @@ -11,6 +11,9 @@
> > > > > >    #include "xe_device.h"
> > > > > >    #include "xe_device_sysfs.h"
> > > > > > +#include "xe_mmio.h"
> > > > > > +#include "xe_pcode_api.h"
> > > > > > +#include "xe_pcode.h"
> > > > > >    #include "xe_pm.h"
> > > > > >    /**
> > > > > > @@ -81,3 +84,101 @@ int xe_pm_sysfs_init(struct xe_device *xe)
> > > > > >    	return devm_add_action_or_reset(dev, xe_pm_sysfs_fini, xe);
> > > > > >    }
> > > > > > +
> > > > > > +/**
> > > > > > + * DOC: PCIe Gen5 Update Limitations
> > > > > > + *
> > > > > > + * Default link speed of discrete GPUs is determined by configuration
> > > > > > + * parameters stored in their flash memory, which are subject to override
> > > > > > + * through user initiated firmware updates. It has been observed that devices
> > > > > > + * configured with PCIe Gen5 as their default speed can come across link
> > > > > > + * quality issues due to host or motherboard limitations and may have to
> > > > > > + * auto-downspeed to PCIe Gen4 when faced with unstable link at Gen5, which
> > > > > > + * makes firmware updates rather risky on such setups. It is required to
> > > > > > + * ensure that the device is capable of auto-downspeeding to PCIe Gen4 link
> > > > > > + * before pushing the image with PCIe Gen5 as default configuration. This
> > > > > > + * can be done by reading ``pcie_gen4_downspeed_capable`` sysfs entry, which
> > > > > > + * will denote PCIe Gen4 downspeed capability of the device with boolean output
> > > > > > + * value of ``0`` or ``1``, meaning `incapable` or `capable` respectively.
> > > > > > + *
> > > > > > + * .. code-block:: shell
> > > > > > + *
> > > > > > + *    $ cat /sys/bus/pci/devices/<bdf>/pcie_gen4_downspeed_capable
> > > > > > + *
> > > > > > + * Pushing PCIe Gen5 update on a downspeed incapable device and facing link
> > > > > > + * instability due to host or motherboard limitations can result in driver
> > > > > > + * failing to bind to the device, making further firmware updates impossible
> > > > > > + * with RMA being the only last resort.
> > > > > > + *
> > > > > > + * PCIe Gen4 downspeed status of downspeed capable devices is available through
> > > > > > + * ``pcie_gen4_downspeed_status`` sysfs entry with boolean output value of
> > > > > > + * ``0`` or ``1``, where ``0`` means no auto-downspeeding was required during
> > > > > > + * link training (which is the optimal scenario) and ``1`` means the device
> > > > > > + * has auto-downsped to PCIe Gen4 due to unstable Gen5 link.
> > > > > > + *
> > > > > > + * .. code-block:: shell
> > > > > > + *
> > > > > > + *    $ cat /sys/bus/pci/devices/<bdf>/pcie_gen4_downspeed_status
> > > > > The code looks good. But i am not sure of the word downspeed.
> > > > > Couldn't find downspeed used in Pcie generation context. For link,
> > > > > it is mentioned as  'link downgrade'
> > > > >
> > > > > Could you share if you found any?
> 
> I'm with Riana here and also found it strange. User is cat'ing a file
> inside  /sys/bus/pci/devices/<bdf>/ - it's odd to use downspeed when
> that word is not used in PCIe spec and...
> 
> > > >
> > > > Since we're describing both firmware and PCI link in the same document,
> > > >
> > > > 1. It helps distinguish between them.
> 
> I also don't understand the distinction you are trying to make. The
> firmware is there, monitoring the communication and and is capable to
> auto downgrade the link to gen4 if it needs to. I imagine that if this
> situation happens, the link speed is re-negotiated and we will end up
> seeing the new one on LnkSta... LnkCap still shows what is possible. The
> only difference I see here from "I'm connecting a gen5 capable device on
> a max gen4 slot" is that initially the gen5 is attempted (since both the
> host and device report they support it), but the firmware may make a decision
> in runtime to "downgrade the link speed".

And all of this will take place when the user is doing a "firmware upgrade",
and this can potentially open the door for "link speed downgrade" to be
confused with "firmware rollback (downgrade) done for the link speed",
which is not the case.

> > > > 2. This information is for the end user and has to be translatable enough
> > > >     regardless of what spec says about it and the distinction reduces the
> > > >     chances of misinterpretation.
> 
> when you introduce a new term that is not known, I'd say the effect is
> pretty much the opposite and it can be even worse if a future pcie spec
> starts to use that term.

Agree, unless we're expecting users to be informed about the spec to be
able to flash their firmwares. But if you insist, sure will _upgrade_ the
document ;)

Raag


More information about the Intel-xe mailing list