[PATCH] drm/xe/xe2: Extend performance tuning to media GT

Matt Roper matthew.d.roper at intel.com
Tue Sep 17 22:42:43 UTC 2024


On Tue, Sep 17, 2024 at 02:02:38PM -0300, Gustavo Sousa wrote:
> Quoting Gustavo Sousa (2024-09-17 13:53:54-03:00)
> >With exception of "Tuning: L3 cache - media", we are currently applying
> >recommended performance tuning settings only for the primary GT. Let's
> >also apply them to the media GT when applicable.
> >
> >According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
> >only in Xe2_LPM and their offsets do not match their primary GT
> >counterparts. As such, we need to have Xe2_LPM-specific definitions for
> >them and apply the setting only for that specific IP.
> >
> >Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
> >offset on the media GT matches the one on the primary one, so we can use
> >the common definition and apply the setting to both IPs.
> >
> >Bspec: 72161
> >Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
> >---
> > drivers/gpu/drm/xe/regs/xe_gt_regs.h |  6 ++++++
> > drivers/gpu/drm/xe/xe_tuning.c       | 19 +++++++++++++++++++
> > 2 files changed, 25 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >index cf21de3adca6..2e655291a84a 100644
> >--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >@@ -169,6 +169,8 @@
> > #define XEHP_SLICE_COMMON_ECO_CHICKEN1                XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
> > #define   MSC_MSAA_REODER_BUF_BYPASS_DISABLE        REG_BIT(14)
> > 
> >+#define XE2LPM_CCCHKNREG1                        XE_REG_MCR(0x82a8)

It looks like they forgot to fill in the complete row of bspec page
71186, but I don't believe this is in an MCR range on the media GT (and
we're not considering it an MCR range in xe_gt_mcr.c, so defining it as
such here will cause a mismatch --- looks like CI already flagged that).

> >+
> > #define VF_PREEMPTION                                XE_REG(0x83a4, XE_REG_OPTION_MASKED)
> > #define   PREEMPTION_VERTEX_COUNT                REG_GENMASK(15, 0)
> > 
> >@@ -399,6 +401,10 @@
> > #define SCRATCH1LPFC                                XE_REG(0xb474)
> > #define   EN_L3_RW_CCS_CACHE_FLUSH                REG_BIT(0)
> > 
> >+#define XE2LPM_L3SQCREG2                        XE_REG_MCR(0xb604)
> >+
> >+#define XE2LPM_L3SQCREG3                        XE_REG_MCR(0xb608)
> >+
> > #define XE2LPM_L3SQCREG5                        XE_REG_MCR(0xb658)
> > 
> > #define XE2_TDF_CTRL                                XE_REG(0xb418)
> >diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
> >index faa1bf42e50e..ea1444358b4f 100644
> >--- a/drivers/gpu/drm/xe/xe_tuning.c
> >+++ b/drivers/gpu/drm/xe/xe_tuning.c
> >@@ -42,20 +42,39 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> >           XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
> >                          SET(CCCHKNREG1, L3CMPCTRL))
> >         },
> >+        { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
> >+          XE_RTP_RULES(MEDIA_VERSION(2000)),
> 
> +Matt
> 
> I used exact match on the media version here because that's what is
> already used for "Tuning: L3 cache - media", but I wonder if we should
> make it MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED),
> similarly to what is done for the primary GT.

Yeah, based on the way they're documenting these in the bspec now, I
think we should start assuming "undefined" upper bound for all of these
until/unless we know otherwise.  In this case it already looks like
PTL's Xe3 media needs the same setting, so we already know this extends
past 20.00.


Matt

> 
> >+          XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
> >+                         SET(CCCHKNREG1, L3CMPCTRL))
> >+        },
> >         { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3"),
> >           XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
> >           XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
> >         },
> >+        { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3 - media"),
> >+          XE_RTP_RULES(MEDIA_VERSION(2000)),
> >+          XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3, COMPPWOVERFETCHEN))
> >+        },
> >         { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
> >           XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
> >           XE_RTP_ACTIONS(SET(L3SQCREG2,
> >                              COMPMEMRD256BOVRFETCHEN))
> >         },
> >+        { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
> >+          XE_RTP_RULES(MEDIA_VERSION(2000)),
> >+          XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
> >+                             COMPMEMRD256BOVRFETCHEN))
> >+        },
> >         { XE_RTP_NAME("Tuning: Stateless compression control"),
> >           XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
> >           XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
> >                                    REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >         },
> >+        { XE_RTP_NAME("Tuning: Stateless compression control - media"),
> >+          XE_RTP_RULES(MEDIA_VERSION_RANGE(1301, 2000)),
> 
> Also in this case, where we are already using a closed interval.
> 
> --
> Gustavo Sousa
> 
> >+          XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
> >+                                   REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >+        },
> >         {}
> > };
> > 
> >-- 
> >2.46.1
> >

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list