[PATCH] drm/xe/xe2: Extend performance tuning to media GT
Gustavo Sousa
gustavo.sousa at intel.com
Wed Sep 18 14:21:20 UTC 2024
Quoting Matt Roper (2024-09-17 19:42:43-03:00)
>On Tue, Sep 17, 2024 at 02:02:38PM -0300, Gustavo Sousa wrote:
>> Quoting Gustavo Sousa (2024-09-17 13:53:54-03:00)
>> >With exception of "Tuning: L3 cache - media", we are currently applying
>> >recommended performance tuning settings only for the primary GT. Let's
>> >also apply them to the media GT when applicable.
>> >
>> >According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
>> >only in Xe2_LPM and their offsets do not match their primary GT
>> >counterparts. As such, we need to have Xe2_LPM-specific definitions for
>> >them and apply the setting only for that specific IP.
>> >
>> >Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
>> >offset on the media GT matches the one on the primary one, so we can use
>> >the common definition and apply the setting to both IPs.
>> >
>> >Bspec: 72161
>> >Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>> >---
>> > drivers/gpu/drm/xe/regs/xe_gt_regs.h | 6 ++++++
>> > drivers/gpu/drm/xe/xe_tuning.c | 19 +++++++++++++++++++
>> > 2 files changed, 25 insertions(+)
>> >
>> >diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >index cf21de3adca6..2e655291a84a 100644
>> >--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >@@ -169,6 +169,8 @@
>> > #define XEHP_SLICE_COMMON_ECO_CHICKEN1 XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
>> > #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
>> >
>> >+#define XE2LPM_CCCHKNREG1 XE_REG_MCR(0x82a8)
>
>It looks like they forgot to fill in the complete row of bspec page
>71186, but I don't believe this is in an MCR range on the media GT (and
>we're not considering it an MCR range in xe_gt_mcr.c, so defining it as
>such here will cause a mismatch --- looks like CI already flagged that).
Yeah.
Although this is also my bad on simply copy/pasting from the original
definition and forgetting to check the steering table!
And looks like we will also need a non-MCR definition for Xe2_HPM's
STATELESS_COMPRESSION_CTRL.
I'll revise the definitions of registers involved in this patch, thanks!
>
>> >+
>> > #define VF_PREEMPTION XE_REG(0x83a4, XE_REG_OPTION_MASKED)
>> > #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
>> >
>> >@@ -399,6 +401,10 @@
>> > #define SCRATCH1LPFC XE_REG(0xb474)
>> > #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
>> >
>> >+#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
>> >+
>> >+#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>> >+
>> > #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>> >
>> > #define XE2_TDF_CTRL XE_REG(0xb418)
>> >diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
>> >index faa1bf42e50e..ea1444358b4f 100644
>> >--- a/drivers/gpu/drm/xe/xe_tuning.c
>> >+++ b/drivers/gpu/drm/xe/xe_tuning.c
>> >@@ -42,20 +42,39 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>> > XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
>> > SET(CCCHKNREG1, L3CMPCTRL))
>> > },
>> >+ { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
>> >+ XE_RTP_RULES(MEDIA_VERSION(2000)),
>>
>> +Matt
>>
>> I used exact match on the media version here because that's what is
>> already used for "Tuning: L3 cache - media", but I wonder if we should
>> make it MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED),
>> similarly to what is done for the primary GT.
>
>Yeah, based on the way they're documenting these in the bspec now, I
>think we should start assuming "undefined" upper bound for all of these
>until/unless we know otherwise. In this case it already looks like
>PTL's Xe3 media needs the same setting, so we already know this extends
>past 20.00.
Right. I'll use an open range in the next version of this patch then.
Thanks.
--
Gustavo Sousa
>
>
>Matt
>
>>
>> >+ XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
>> >+ SET(CCCHKNREG1, L3CMPCTRL))
>> >+ },
>> > { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3"),
>> > XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
>> > XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
>> > },
>> >+ { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3 - media"),
>> >+ XE_RTP_RULES(MEDIA_VERSION(2000)),
>> >+ XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3, COMPPWOVERFETCHEN))
>> >+ },
>> > { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
>> > XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
>> > XE_RTP_ACTIONS(SET(L3SQCREG2,
>> > COMPMEMRD256BOVRFETCHEN))
>> > },
>> >+ { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
>> >+ XE_RTP_RULES(MEDIA_VERSION(2000)),
>> >+ XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
>> >+ COMPMEMRD256BOVRFETCHEN))
>> >+ },
>> > { XE_RTP_NAME("Tuning: Stateless compression control"),
>> > XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
>> > XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
>> > REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> > },
>> >+ { XE_RTP_NAME("Tuning: Stateless compression control - media"),
>> >+ XE_RTP_RULES(MEDIA_VERSION_RANGE(1301, 2000)),
>>
>> Also in this case, where we are already using a closed interval.
>>
>> --
>> Gustavo Sousa
>>
>> >+ XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
>> >+ REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> >+ },
>> > {}
>> > };
>> >
>> >--
>> >2.46.1
>> >
>
>--
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation
More information about the Intel-xe
mailing list