[PATCH] drm/xe/bmg: improve cache flushing behaviour
Nirmoy Das
nirmoy.das at intel.com
Tue Sep 3 13:49:57 UTC 2024
On 9/3/2024 1:08 PM, Matthew Auld wrote:
> On 03/09/2024 11:53, Nirmoy Das wrote:
>>
>> On 9/2/2024 5:37 PM, Matthew Auld wrote:
>>> The BSpec seems to suggest that EN_L3_RW_CCS_CACHE_FLUSH must be
>>> toggled
>>> on for manual global invalidation to take effect
>>
>> I couldn't find this reference, in which bspec is this mentioned ?
>
> BSpec: 71718
>
> For discrete under global invalidation it says: "Device Cache flushed
> if SCRATCH1LPFC bit[0] is set".
I was only looking at the register spec.
If I get this correctly, setting SCRATCH1LPFC would flush device
cache/l2 from the L3 without it I assume global inval isn't useful to
flush device cache.
Can SCRATCH1LPFC be set/reset dynamically, if so then may be set
SCRATCH1LPFC before global inval and reset it back after that ?
> Which is why I originally added this, however this then turns flushing
> on for all kinds of stuff like pipecontrol which is not what we want.
> But from playing around with this a bunch on BMG that doesn't look to
> be true. Also the original WA made no mention of needing to mess with
> SCRATCH1LPFC.
>
> I'm kind of hoping this helps that compute benchmark with not nuking
> entire device cache between submissions.
I tried this one and is indeed improves compute test bandwidth.
Regards,
Nirmoy
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>>> and actually flush
>>> device cache, however this also turns on flushing for things like
>>> pipecontrol, which occurs between submissions for compute/render. This
>>> sounds like massive overkill for our needs, where we already have the
>>> manual flushing on the display side with the global invalidation. Some
>>> observations on BMG:
>>>
>>> 1. Disabling l2 caching for host writes and stubbing out the driver
>>> global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH
>>> enabled, has
>>> no impact on wb-transient-vs-display IGT, which makes sense
>>> since the
>>> pipecontrol is now flushing the device cache after the render copy.
>>> Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
>>> expected since device cache is now dirty and display engine
>>> can't see
>>> the writes.
>>>
>>> 2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
>>> invalidation also has no impact on wb-transient-vs-display. This
>>> suggests that the global invalidation still works as expected
>>> and is
>>> flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH
>>> turned on.
>>>
>>> With that drop EN_L3_RW_CCS_CACHE_FLUSH.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>>> ---
>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ---
>>> drivers/gpu/drm/xe/xe_gt.c | 1 -
>>> 2 files changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> index 0d1a4a9f4e11..88a01970cc5c 100644
>>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> @@ -387,9 +387,6 @@
>>> #define XE2_GLOBAL_INVAL XE_REG(0xb404)
>>> -#define SCRATCH1LPFC XE_REG(0xb474)
>>> -#define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
>>> -
>>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>> #define XE2_TDF_CTRL XE_REG(0xb418)
>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>> index f82b3e8ac5c8..313cc4242281 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>> @@ -110,7 +110,6 @@ static void xe_gt_enable_host_l2_vram(struct
>>> xe_gt *gt)
>>> return;
>>> if (!xe_gt_is_media_type(gt)) {
>>> - xe_mmio_write32(gt, SCRATCH1LPFC, EN_L3_RW_CCS_CACHE_FLUSH);
>>> reg = xe_gt_mcr_unicast_read_any(gt, XE2_GAMREQSTRM_CTRL);
>>> reg |= CG_DIS_CNTLBUS;
>>> xe_gt_mcr_multicast_write(gt, XE2_GAMREQSTRM_CTRL, reg);
More information about the Intel-xe
mailing list