[PATCH] drm/xe/bmg: improve cache flushing behaviour

Matthew Auld matthew.auld at intel.com
Tue Sep 3 14:33:27 UTC 2024


On 03/09/2024 14:49, Nirmoy Das wrote:
> 
> On 9/3/2024 1:08 PM, Matthew Auld wrote:
>> On 03/09/2024 11:53, Nirmoy Das wrote:
>>>
>>> On 9/2/2024 5:37 PM, Matthew Auld wrote:
>>>> The BSpec seems to suggest that EN_L3_RW_CCS_CACHE_FLUSH must be 
>>>> toggled
>>>> on for manual global invalidation to take effect
>>>
>>> I couldn't find this reference, in which bspec is this mentioned ?
>>
>> BSpec: 71718
>>
>> For discrete under global invalidation it says: "Device Cache flushed 
>> if SCRATCH1LPFC bit[0] is set". 
> 
> 
> I was only looking at the register spec.
> 
> If I get this correctly, setting SCRATCH1LPFC would flush device 
> cache/l2 from the L3 without it I assume global inval isn't useful to 
> flush device cache.

 From observation setting SCRATCH1LPFC will result in a flush of the 
device cache as part of the pipecontrol that is emitted at the end of 
each batch for compute/render. Basically exactly like LNL. On BMG this 
is not needed, since SA media is coherent.

 From observation if you take this patch as-is which turns off 
SCRATCH1LPFC, and then stub out the global invalidation in 
xe_device_l2_flush(), then display engine observes corruption when doing 
a render copy onto the display surface prior to display flip, using a 
pat index enabling caching. I assume same thing happens if doing VRAM 
writes from host side, but did not check this. If you add back the 
global invalidation then no display corruption. So seems like 
SCRATCH1LPFC is not needed for global invalidation in order for device 
cache to be flushed, but is needed for pipecontrol etc.

> 
> Can SCRATCH1LPFC be set/reset dynamically, if so then may be set 
> SCRATCH1LPFC  before global inval and reset it back after that ?

I suppose it might be possible, but not sure what happens with 
concurrent submission and trying to mess with SCRATCH1LPFC.

> 
> 
> 
>> Which is why I originally added this, however this then turns flushing 
>> on for all kinds of stuff like pipecontrol which is not what we want. 
>> But from playing around with this a bunch on BMG that doesn't look to 
>> be true. Also the original WA made no mention of needing to mess with 
>> SCRATCH1LPFC.
>>
>> I'm kind of hoping this helps that compute benchmark with not nuking 
>> entire device cache between submissions.
> 
> 
> I tried this one and is indeed improves compute test bandwidth.
> 
> 
> Regards,
> 
> Nirmoy
> 
>>
>>>
>>>
>>> Regards,
>>>
>>> Nirmoy
>>>
>>>>   and actually flush
>>>> device cache, however this also turns on flushing for things like
>>>> pipecontrol, which occurs between submissions for compute/render. This
>>>> sounds like massive overkill for our needs, where we already have the
>>>> manual flushing on the display side with the global invalidation. Some
>>>> observations on BMG:
>>>>
>>>> 1. Disabling l2 caching for host writes and stubbing out the driver
>>>>     global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH 
>>>> enabled, has
>>>>     no impact on wb-transient-vs-display IGT, which makes sense 
>>>> since the
>>>>     pipecontrol is now flushing the device cache after the render copy.
>>>>     Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
>>>>     expected since device cache is now dirty and display engine 
>>>> can't see
>>>>     the writes.
>>>>
>>>> 2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
>>>>     invalidation also has no impact on wb-transient-vs-display. This
>>>>     suggests that the global invalidation still works as expected 
>>>> and is
>>>>     flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH 
>>>> turned on.
>>>>
>>>> With that drop EN_L3_RW_CCS_CACHE_FLUSH.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ---
>>>>   drivers/gpu/drm/xe/xe_gt.c           | 1 -
>>>>   2 files changed, 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h 
>>>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> index 0d1a4a9f4e11..88a01970cc5c 100644
>>>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> @@ -387,9 +387,6 @@
>>>>   #define XE2_GLOBAL_INVAL            XE_REG(0xb404)
>>>> -#define SCRATCH1LPFC                XE_REG(0xb474)
>>>> -#define   EN_L3_RW_CCS_CACHE_FLUSH        REG_BIT(0)
>>>> -
>>>>   #define XE2LPM_L3SQCREG5            XE_REG_MCR(0xb658)
>>>>   #define XE2_TDF_CTRL                XE_REG(0xb418)
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>>> index f82b3e8ac5c8..313cc4242281 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>>> @@ -110,7 +110,6 @@ static void xe_gt_enable_host_l2_vram(struct 
>>>> xe_gt *gt)
>>>>           return;
>>>>       if (!xe_gt_is_media_type(gt)) {
>>>> -        xe_mmio_write32(gt, SCRATCH1LPFC, EN_L3_RW_CCS_CACHE_FLUSH);
>>>>           reg = xe_gt_mcr_unicast_read_any(gt, XE2_GAMREQSTRM_CTRL);
>>>>           reg |= CG_DIS_CNTLBUS;
>>>>           xe_gt_mcr_multicast_write(gt, XE2_GAMREQSTRM_CTRL, reg);


More information about the Intel-xe mailing list