[PATCH v1 1/2] drm/xe/debugfs: Expose PCIe Gen5 update telemetry
Rodrigo Vivi
rodrigo.vivi at intel.com
Mon Mar 31 15:23:39 UTC 2025
On Mon, Mar 31, 2025 at 09:52:28AM -0500, Lucas De Marchi wrote:
> On Mon, Mar 31, 2025 at 07:53:35PM +0530, Raag Jadav wrote:
> > Expose debugfs telemetry required for PCIe Gen5 firmware update for
>
> telemetry?? it doesn't seem anything related to telemetry here.
telemetry is definitely not a good word here...
>
> > discrete GPUs.
> >
> > Signed-off-by: Raag Jadav <raag.jadav at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_debugfs.c | 93 +++++++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_pcode_api.h | 4 ++
> > 2 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> > index d0503959a8ed..67c941abf4fe 100644
> > --- a/drivers/gpu/drm/xe/xe_debugfs.c
> > +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> > @@ -17,6 +17,9 @@
> > #include "xe_gt_debugfs.h"
> > #include "xe_gt_printk.h"
> > #include "xe_guc_ads.h"
> > +#include "xe_mmio.h"
> > +#include "xe_pcode_api.h"
> > +#include "xe_pcode.h"
> > #include "xe_pm.h"
> > #include "xe_pxp_debugfs.h"
> > #include "xe_sriov.h"
> > @@ -191,6 +194,89 @@ static const struct file_operations wedged_mode_fops = {
> > .write = wedged_mode_set,
> > };
> >
> > +/**
> > + * DOC: PCIe Gen5 Update Limitations
> > + *
> > + * Default link speed of discrete GPUs is determined by FIT parameters stored
> > + * in their flash memory, which are subject to override through user initiated
> > + * firmware updates. It has been observed that devices configured with PCIe
> > + * Gen5 as their default speed can come across link quality issues due to host
> > + * or motherboard limitations and may have to auto-downspeed to PCIe Gen4 when
> > + * faced with unstable link at Gen5. The users are required to ensure that the
> > + * device is capable of auto-downspeeding to PCIe Gen4 before pushing the image
> > + * with Gen5 as default configuration. This can be done by reading
> > + * ``pcie_gen4_downspeed_capable`` debugfs entry, which will denote PCIe Gen4
> > + * auto-downspeed capability of the device with boolean output value of ``0``
> > + * or ``1``, meaning `incapable` or `capable` respectively.
>
> It doesn't seem like something to have in debugfs. If this is for end
> users, they may not even have debugfs mounted or available at all.
I was one pushing it more towards debugfs, but I now believe this is sysfs as is.
The admin needs this information before upgrading the IFWI.
>
> Please clarify what's being used for this firmware upgrade.
The final goal is to have the fwupdtool. Some work that Tomas had started.
We are still clearing the path for that. But also there are some igsc tools
that can be used now to flash the fw...
And likely that XPU manager soon plugging into that.
>
> > + *
> > + * .. code-block:: shell
> > + *
> > + * $ cat /sys/kernel/debug/dri/<N>/pcie_gen4_downspeed_capable
> > + *
> > + * Pushing PCIe Gen5 update on a auto-downspeed incapable device and facing
>
> Isn't the ability to downgrade the link to Gen4 something controlled by
> the firmware? Why would we push a Gen5 firmware that can't downgrade to
> Gen4?
There are safe combinations out there that works well and safely in gen5.
No need to downgrade. But in many cases it is safe to check the gen4 downgrade
possibility and status...
>
> > + * link instability due to host or motherboard limitations can result in driver
> > + * not being able to successfully bind to the device, making further firmware
> > + * updates impossible with RMA being the only last resort.
>
> when starting survivability mode, can't we always force it to gen4 to
> avoid this kind of issues?
We, we cannot choose that from software.
>
> Lucas De Marchi
More information about the Intel-xe
mailing list