[PATCH v5 6/6] drm/xe: disable wa_15015404425 for PTL B0

Thu Jul 3 21:21:59 UTC 2025

On Wed, Jul 02, 2025 at 12:30:36PM -0700, Matt Atwood wrote:
>Wa_15015404425 only needs to be applied on PTL platforms with an A step
>compute die. There is no way to map PCI revid to the compute die
>stepping. The easiest way to figure out compute die stepping our end is
>to map the media IP's stepping to the compute die. For PTL, compute die
>has an A stepping if and only if the media IP's stepping is also A-step
>(This relationship is determined on a per platform basis and just
>happens to be this way on PTL).
>
>In addition this workaround is a chicken-and-egg problem. Wa_15015404425
>requires that all register reads be preceded by four dummy MMIO writes
>(including during early driver  init and even pre-OS firmware). The
>driver needs to perform some MMIO reads during init which include the
>GMD_ID register that contains the Media IPs stepping. To handle this in
>the safest manner assume the workaround applies to all of PTL during
>driver probe and deactivate the workaround after.
>
>The overall solution becomes a set of two workarounds:
>
>* 15015404425 - a Device OOB workaround that's always active for PTL
>* 15015404425_disable - a GT OOB workaround that applies to PTL
>  platfroms with a B0 or later stepping
>
>The first of these workarounds issues dummy MMIO writes we do when
>reading registers. The second guards logic that disables the first once
>we have the necessary information later in the probe process.
>
>v2: rename SoC to device, avoid null pointer dereference, update commit
>message.
>v3: rebase
>v5: move disable check into xe_device_probe to avoid linking in xe_wa
>into xe_pci, reword commit message
>
>Signed-off-by: Matt Atwood <matthew.s.atwood at intel.com>

I don't want to have the case of applying just the previous patch and
thus leaving this WA enabled when it shouldn't. Let's squash this patch
with the previous one.

Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>

Lucas De Marchi

>---
> drivers/gpu/drm/xe/xe_device.c     | 5 +++++
> drivers/gpu/drm/xe/xe_wa.h         | 5 +++++
> drivers/gpu/drm/xe/xe_wa_oob.rules | 2 ++
> 3 files changed, 12 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index 53df142c1031..5a771b0f8f19 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -68,6 +68,7 @@
> #include "xe_wa.h"
>
> #include <generated/xe_wa_oob.h>
>+#include <generated/xe_device_wa_oob.h>
>
> static int xe_file_open(struct drm_device *dev, struct drm_file *file)
> {
>@@ -863,6 +864,10 @@ int xe_device_probe(struct xe_device *xe)
> 			return err;
> 	}
>
>+	if (xe->tiles->media_gt &&
>+	    XE_WA(xe->tiles->media_gt, 15015404425_disable))
>+		XE_DEVICE_WA_DISABLE(xe, 15015404425);
>+
> 	xe_nvm_init(xe);
>
> 	err = xe_heci_gsc_init(xe);
>diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
>index 3793fcae38a0..907eea020b9d 100644
>--- a/drivers/gpu/drm/xe/xe_wa.h
>+++ b/drivers/gpu/drm/xe/xe_wa.h
>@@ -45,4 +45,9 @@ void xe_wa_dump(struct xe_gt *gt, struct drm_printer *p);
> 	test_bit(XE_DEVICE_WA_OOB_ ## id__, (xe__)->wa_active.oob);		\
> })
>
>+#define XE_DEVICE_WA_DISABLE(xe__, id__) ({				\
>+	xe_assert(xe__, (xe__)->wa_active.oob_initialized);			\
>+	clear_bit(XE_DEVICE_WA_OOB_ ## id__, (xe__)->wa_active.oob);		\
>+})
>+
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
>index 96cc33da0fb5..255e67113406 100644
>--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
>+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
>@@ -70,3 +70,5 @@ no_media_l3	MEDIA_VERSION(3000)
> # SoC workaround - currently applies to all platforms with the following
> # primary GT GMDID
> 14022085890	GRAPHICS_VERSION(2001)
>+
>+15015404425_disable	PLATFORM(PANTHERLAKE), MEDIA_STEP(B0, FOREVER)
>-- 
>2.49.0
>