[PATCH] drm/xe/bmg: improve cache flushing behaviour
Matthew Auld
matthew.auld at intel.com
Tue Sep 3 14:33:27 UTC 2024
On 03/09/2024 14:49, Nirmoy Das wrote:
>
> On 9/3/2024 1:08 PM, Matthew Auld wrote:
>> On 03/09/2024 11:53, Nirmoy Das wrote:
>>>
>>> On 9/2/2024 5:37 PM, Matthew Auld wrote:
>>>> The BSpec seems to suggest that EN_L3_RW_CCS_CACHE_FLUSH must be
>>>> toggled
>>>> on for manual global invalidation to take effect
>>>
>>> I couldn't find this reference, in which bspec is this mentioned ?
>>
>> BSpec: 71718
>>
>> For discrete under global invalidation it says: "Device Cache flushed
>> if SCRATCH1LPFC bit[0] is set".
>
>
> I was only looking at the register spec.
>
> If I get this correctly, setting SCRATCH1LPFC would flush device
> cache/l2 from the L3 without it I assume global inval isn't useful to
> flush device cache.
From observation setting SCRATCH1LPFC will result in a flush of the
device cache as part of the pipecontrol that is emitted at the end of
each batch for compute/render. Basically exactly like LNL. On BMG this
is not needed, since SA media is coherent.
From observation if you take this patch as-is which turns off
SCRATCH1LPFC, and then stub out the global invalidation in
xe_device_l2_flush(), then display engine observes corruption when doing
a render copy onto the display surface prior to display flip, using a
pat index enabling caching. I assume same thing happens if doing VRAM
writes from host side, but did not check this. If you add back the
global invalidation then no display corruption. So seems like
SCRATCH1LPFC is not needed for global invalidation in order for device
cache to be flushed, but is needed for pipecontrol etc.
>
> Can SCRATCH1LPFC be set/reset dynamically, if so then may be set
> SCRATCH1LPFC before global inval and reset it back after that ?
I suppose it might be possible, but not sure what happens with
concurrent submission and trying to mess with SCRATCH1LPFC.
>
>
>
>> Which is why I originally added this, however this then turns flushing
>> on for all kinds of stuff like pipecontrol which is not what we want.
>> But from playing around with this a bunch on BMG that doesn't look to
>> be true. Also the original WA made no mention of needing to mess with
>> SCRATCH1LPFC.
>>
>> I'm kind of hoping this helps that compute benchmark with not nuking
>> entire device cache between submissions.
>
>
> I tried this one and is indeed improves compute test bandwidth.
>
>
> Regards,
>
> Nirmoy
>
>>
>>>
>>>
>>> Regards,
>>>
>>> Nirmoy
>>>
>>>> and actually flush
>>>> device cache, however this also turns on flushing for things like
>>>> pipecontrol, which occurs between submissions for compute/render. This
>>>> sounds like massive overkill for our needs, where we already have the
>>>> manual flushing on the display side with the global invalidation. Some
>>>> observations on BMG:
>>>>
>>>> 1. Disabling l2 caching for host writes and stubbing out the driver
>>>> global invalidation but keeping EN_L3_RW_CCS_CACHE_FLUSH
>>>> enabled, has
>>>> no impact on wb-transient-vs-display IGT, which makes sense
>>>> since the
>>>> pipecontrol is now flushing the device cache after the render copy.
>>>> Without EN_L3_RW_CCS_CACHE_FLUSH the test then fails, which is also
>>>> expected since device cache is now dirty and display engine
>>>> can't see
>>>> the writes.
>>>>
>>>> 2. Disabling EN_L3_RW_CCS_CACHE_FLUSH, but keeping the driver global
>>>> invalidation also has no impact on wb-transient-vs-display. This
>>>> suggests that the global invalidation still works as expected
>>>> and is
>>>> flushing the device cache without EN_L3_RW_CCS_CACHE_FLUSH
>>>> turned on.
>>>>
>>>> With that drop EN_L3_RW_CCS_CACHE_FLUSH.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>> Cc: Nirmoy Das <nirmoy.das at intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ---
>>>> drivers/gpu/drm/xe/xe_gt.c | 1 -
>>>> 2 files changed, 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> index 0d1a4a9f4e11..88a01970cc5c 100644
>>>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>>> @@ -387,9 +387,6 @@
>>>> #define XE2_GLOBAL_INVAL XE_REG(0xb404)
>>>> -#define SCRATCH1LPFC XE_REG(0xb474)
>>>> -#define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
>>>> -
>>>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>>> #define XE2_TDF_CTRL XE_REG(0xb418)
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>>> index f82b3e8ac5c8..313cc4242281 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>>> @@ -110,7 +110,6 @@ static void xe_gt_enable_host_l2_vram(struct
>>>> xe_gt *gt)
>>>> return;
>>>> if (!xe_gt_is_media_type(gt)) {
>>>> - xe_mmio_write32(gt, SCRATCH1LPFC, EN_L3_RW_CCS_CACHE_FLUSH);
>>>> reg = xe_gt_mcr_unicast_read_any(gt, XE2_GAMREQSTRM_CTRL);
>>>> reg |= CG_DIS_CNTLBUS;
>>>> xe_gt_mcr_multicast_write(gt, XE2_GAMREQSTRM_CTRL, reg);
More information about the Intel-xe
mailing list