[PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
Upadhyay, Tejas
tejas.upadhyay at intel.com
Fri Sep 20 05:42:01 UTC 2024
> -----Original Message-----
> From: Sousa, Gustavo <gustavo.sousa at intel.com>
> Sent: Thursday, September 19, 2024 11:39 PM
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
> xe at lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper at intel.com>
> Subject: RE: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
>
> Quoting Upadhyay, Tejas (2024-09-19 05:00:22-03:00)
> >
> >
> >> -----Original Message-----
> >> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
> >> Gustavo Sousa
> >> Sent: Thursday, September 19, 2024 2:17 AM
> >> To: intel-xe at lists.freedesktop.org
> >> Cc: Roper, Matthew D <matthew.d.roper at intel.com>
> >> Subject: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media
> >> GT
> >>
> >> With exception of "Tuning: L3 cache - media", we are currently
> >> applying recommended performance tuning settings only for the primary
> >> GT. Let's also implement them for the media GT when applicable.
> >>
> >> According to our spec, media GT registers CCCHKNREG1 and L3SQCREG*
> >> exist only in Xe2_LPM and their offsets do not match their primary GT
> >> counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
> >> listed as a multicast range on the media GT. As such, we need to have
> >> Xe2_LPM-specific definitions for those registers and apply the
> >> setting only for that specific IP.
> >>
> >> Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and
> the
> >> offset on the media GT matches the one on the primary one. However,
> >> the range that contains that register is not is not listed as a
> >> multicast range, so we need two different entries for media.
> >>
> >> v2:
> >> - Fix implementation with respect to multicast vs non-multicast
> >> registers. (Matt)
> >> - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
> >> Compression Overfetch - media".
> >>
> >> Bspec: 72161
> >> Cc: Matt Roper <matthew.d.roper at intel.com>
> >> Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
> >> ---
> >> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 7 +++++++
> >> drivers/gpu/drm/xe/xe_tuning.c | 24 ++++++++++++++++++++++++
> >> 2 files changed, 31 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> index cf21de3adca6..6ec2d2c11d77 100644
> >> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> @@ -80,6 +80,7 @@
> >> #define LE_CACHEABILITY_MASK REG_GENMASK(1, 0)
> >> #define LE_CACHEABILITY(value)
> >> REG_FIELD_PREP(LE_CACHEABILITY_MASK, value)
> >>
> >> +#define XELPMP_STATELESS_COMPRESSION_CTRL XE_REG(0x4148)
> >
> >Were trying to say, XE2LPM_ here? Also this seems to be MCR register.
>
> Yeah, you're right on both. I was looking at steering spec for MTL media
> instead of BMG's when adding this and then used XELPMP_ thinking that
> Xe_LMP+ also had that register.
>
> Thanks for catching this. I'll update this on the next version of this series.
>
> It looks like we also need to fix the logic around MCR tables in our driver,
> since we are selecting Xe_LPM+'s table for Xe2_LPM.
>
> >
> >> #define STATELESS_COMPRESSION_CTRL
> >> XE_REG_MCR(0x4148)
> >> #define UNIFIED_COMPRESSION_FORMAT REG_GENMASK(3, 0)
> >>
> >> @@ -169,6 +170,8 @@
> >> #define XEHP_SLICE_COMMON_ECO_CHICKEN1
> >> XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
> >> #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
> >>
> >> +#define XE2LPM_CCCHKNREG1 XE_REG(0x82a8)
> >> +
> >> #define VF_PREEMPTION XE_REG(0x83a4,
> >> XE_REG_OPTION_MASKED)
> >> #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
> >>
> >> @@ -399,6 +402,10 @@
> >> #define SCRATCH1LPFC XE_REG(0xb474)
> >> #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
> >>
> >> +#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
> >> +
> >> +#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
> >> +
> >
> >These are not marked MCR in bspec. Is there something I missed looking.
>
> I just checked Bspec 71186 again and range [0x38B600:0x38B8FF] is marked
> as multicast.
Ok, as I mentioned in other comment, I completely missed media table while I was looking at this stage. You can add my r-o-b, when you incorporate above comments,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
Tejas
>
> --
> Gustavo Sousa
>
> >
> >> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
> >>
> >> #define XE2_TDF_CTRL XE_REG(0xb418)
> >> diff --git a/drivers/gpu/drm/xe/xe_tuning.c
> >> b/drivers/gpu/drm/xe/xe_tuning.c index faa1bf42e50e..7a5b852af8d7
> >> 100644
> >> --- a/drivers/gpu/drm/xe/xe_tuning.c
> >> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> >> @@ -42,20 +42,44 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> >> XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
> >> SET(CCCHKNREG1, L3CMPCTRL))
> >> },
> >> + { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
> >> + SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
> >> + },
> >> { XE_RTP_NAME("Tuning: Enable compressible partial write
> >> overfetch in L3"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
> >> },
> >> + { XE_RTP_NAME("Tuning: Enable compressible partial write
> >> + overfetch
> >> in L3 - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3,
> >> COMPPWOVERFETCHEN))
> >> + },
> >> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(SET(L3SQCREG2,
> >> COMPMEMRD256BOVRFETCHEN))
> >> },
> >> + { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
> >> + COMPMEMRD256BOVRFETCHEN))
> >> + },
> >> { XE_RTP_NAME("Tuning: Stateless compression control"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >>
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> },
> >> + { XE_RTP_NAME("Tuning: Stateless compression control - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >> +
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> + },
> >> + { XE_RTP_NAME("Tuning: Stateless compression control - media
> >> (Xe2_HPM)"),
> >> + XE_RTP_RULES(MEDIA_VERSION(1301)),
> >> +
> >> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >> +
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> + },
> >> {}
> >> };
> >>
> >> --
> >> 2.46.1
> >
More information about the Intel-xe
mailing list