[PATCH v4 5/5] drm/xe: disable wa_15015404425 for PTL B0
Lucas De Marchi
lucas.demarchi at intel.com
Thu Jun 26 22:45:41 UTC 2025
On Wed, Jun 25, 2025 at 01:07:50PM -0700, Matt Atwood wrote:
>This workaround only applies to PTL Compute Die A0. The reality of
>modern platforms is we're Multi Chip Packages with logic spread across
>multiple dies. Because this information is not available during PCI
>probe it becomes a bit more complicated.
>
>This workaround needs to be applied on PTL until we prove that we are
>not Compute Die A0 stepping without reading any MMIOs. So use the new
>XE_DEVICE_WA infrastructure to apply early, until we can determine our
>stepping.
>
>There are at least two ways to determine Compute Die stepping. This
>patch uses the Media GT stepping to map to Compute Die stepping, in this
>case Compute and Media dies step synchronously.
>
>Since we're using the Media GT information to determine Compute Die
>stepping, use the XE_WA and the oob infrastructure to come back and
>toggle the workaround off when we know its safe, and after GT init.
>
>v2: rename SoC to device, avoid null pointer dereference, update commit
>message.
>v3: rebase
>
>Signed-off-by: Matt Atwood <matthew.s.atwood at intel.com>
>---
> drivers/gpu/drm/xe/xe_pci.c | 7 +++++++
> drivers/gpu/drm/xe/xe_wa_oob.rules | 2 ++
> 2 files changed, 9 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>index a9f708608f3e..a294a1b26d8f 100644
>--- a/drivers/gpu/drm/xe/xe_pci.c
>+++ b/drivers/gpu/drm/xe/xe_pci.c
>@@ -34,6 +34,9 @@
> #include "xe_tile.h"
> #include "xe_wa.h"
>
>+#include "generated/xe_wa_oob.h"
>+#include "generated/xe_device_wa_oob.h"
>+
> enum toggle_d3cold {
> D3COLD_DISABLE,
> D3COLD_ENABLE,
>@@ -896,6 +899,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> drm_dbg(&xe->drm, "d3cold: capable=%s\n",
> str_yes_no(xe->d3cold.capable));
>
>+ if (xe->tiles->media_gt != NULL &&
>+ XE_WA(xe->tiles->media_gt, 15015404425_disable))
>+ xe->oob[XE_DEVICE_WA_OOB_15015404425] = 0;
this is totally untested, right? oob is an array of longs... for both 32
and 64 bits up to this patch it will have size 1 and you are supposed to
set the **bit** XE_DEVICE_WA_OOB_15015404425.
>+
> return 0;
>
> err_driver_cleanup:
>diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
>index 96cc33da0fb5..255e67113406 100644
>--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
>+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
>@@ -70,3 +70,5 @@ no_media_l3 MEDIA_VERSION(3000)
> # SoC workaround - currently applies to all platforms with the following
> # primary GT GMDID
> 14022085890 GRAPHICS_VERSION(2001)
>+
>+15015404425_disable PLATFORM(PANTHERLAKE), MEDIA_STEP(B0, FOREVER)
I don't like this disable logic. This would make reporting 15015404425
as always disabled when looking at debugfs. I will need to look at
earlier discussions and think a little bit as I don't know yet what we
are trying to accomplish. Maybe if we have 2 positive checks with one
being a "_early" variant (based on the commit message here)
Lucas De Marchi
>--
>2.49.0
>
More information about the Intel-xe
mailing list