[PATCH v1 1/2] drm/xe/debugfs: Expose PCIe Gen5 update telemetry

Lucas De Marchi lucas.demarchi at intel.com
Mon Mar 31 14:52:28 UTC 2025


On Mon, Mar 31, 2025 at 07:53:35PM +0530, Raag Jadav wrote:
>Expose debugfs telemetry required for PCIe Gen5 firmware update for

telemetry?? it doesn't seem anything related to telemetry here.

>discrete GPUs.
>
>Signed-off-by: Raag Jadav <raag.jadav at intel.com>
>---
> drivers/gpu/drm/xe/xe_debugfs.c   | 93 +++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_pcode_api.h |  4 ++
> 2 files changed, 97 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
>index d0503959a8ed..67c941abf4fe 100644
>--- a/drivers/gpu/drm/xe/xe_debugfs.c
>+++ b/drivers/gpu/drm/xe/xe_debugfs.c
>@@ -17,6 +17,9 @@
> #include "xe_gt_debugfs.h"
> #include "xe_gt_printk.h"
> #include "xe_guc_ads.h"
>+#include "xe_mmio.h"
>+#include "xe_pcode_api.h"
>+#include "xe_pcode.h"
> #include "xe_pm.h"
> #include "xe_pxp_debugfs.h"
> #include "xe_sriov.h"
>@@ -191,6 +194,89 @@ static const struct file_operations wedged_mode_fops = {
> 	.write = wedged_mode_set,
> };
>
>+/**
>+ * DOC: PCIe Gen5 Update Limitations
>+ *
>+ * Default link speed of discrete GPUs is determined by FIT parameters stored
>+ * in their flash memory, which are subject to override through user initiated
>+ * firmware updates. It has been observed that devices configured with PCIe
>+ * Gen5 as their default speed can come across link quality issues due to host
>+ * or motherboard limitations and may have to auto-downspeed to PCIe Gen4 when
>+ * faced with unstable link at Gen5. The users are required to ensure that the
>+ * device is capable of auto-downspeeding to PCIe Gen4 before pushing the image
>+ * with Gen5 as default configuration. This can be done by reading
>+ * ``pcie_gen4_downspeed_capable`` debugfs entry, which will denote PCIe Gen4
>+ * auto-downspeed capability of the device with boolean output value of ``0``
>+ * or ``1``, meaning `incapable` or `capable` respectively.

It doesn't seem like something to have in debugfs. If this is for end
users, they may not even have debugfs mounted or available at all.

Please clarify what's being used for this firmware upgrade.

>+ *
>+ * .. code-block:: shell
>+ *
>+ *    $ cat /sys/kernel/debug/dri/<N>/pcie_gen4_downspeed_capable
>+ *
>+ * Pushing PCIe Gen5 update on a auto-downspeed incapable device and facing

Isn't the ability to downgrade the link to Gen4 something controlled by
the firmware? Why would we push a Gen5 firmware that can't downgrade to
Gen4?

>+ * link instability due to host or motherboard limitations can result in driver
>+ * not being able to successfully bind to the device, making further firmware
>+ * updates impossible with RMA being the only last resort.

when starting survivability mode, can't we always force it to gen4 to
avoid this kind of issues?

Lucas De Marchi


More information about the Intel-xe mailing list