[PATCH v7 11/24] drm/xe: Flush L3 when flushing render cache
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Mon Jun 30 12:43:10 UTC 2025
On 27/06/2025 19:57, Souza, Jose wrote:
> On Fri, 2025-06-27 at 11:23 -0700, José Roberto de Souza wrote:
>> On Fri, 2025-06-27 at 14:33 +0100, Tvrtko Ursulin wrote:
>>> I915 sets PIPE_CONTROL_FLUSH_L3 (bit 27) when flushing render caches but
>>> interesting thing is Tigerlake PRM lists that bit as reserved.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>> ---
>>> Is xe missing this? Or has this been wrong for so long in i915? Or is this
>>> an undocumented bit?
>>> ---
>>> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
>>> drivers/gpu/drm/xe/xe_ring_ops.c | 10 ++++++++++
>>> 2 files changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>>> index 78c0e87dbd37..27892984403c 100644
>>> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>>> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>>> @@ -47,6 +47,7 @@
>>>
>>> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
>>> #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
>>> +#define PIPE_CONTROL_FLUSH_L3 (1<<27)
>>
>> On spec this bit is Protected Memory Disable. I think what you want is bit 30.
>
> That is also wrong on i915 oO.
Yeah, most of the series is me copying "reference" implementation from i915.
But you are right, bit 27 appears wrong, and the history of the whole
area is a bit confusing.
The existing PIPE_CONTROL_FLUSH_L3 definition was added back in 2015 in:
0160f055393f ("drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch
workaround")
But if I look at the public PRMs for BDW and CHV bit 27 is reserved MBZ.
Bit 30 is the same. Was it something undocumented? I don't know.
Furthermore, the public PRM has no mention of
WaClearSlmSpaceAtContextSwitch. It has this
WaFlushCoherentL3CacheLinesAtContextSwitch which is not quite the thing:
"""
Coherent L3 cache lines are not getting flushed during context switch
which is causing
issues like corruption. Need to set bit 21 of MMIO b118, then send PC
with DC flush and
then reset bit 21 of MMIO b118. This programming sequence needs to be
part of the indirect
context WA BB.
"""
This WA was later extnded to KBL:
066d46288851 ("drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch")
But KBL public PRM is the same - I don't see any WA which mention SLM or
bit 30. And bits 27 and 30 are still reserved.
There is a new bit 26 (Flush LLC) on SKL and KBL but we do not use it or
define it.
That also feels odd.
And finally L3 flushing was added to the plain ring programming in:
0c7c0c8e6f09 ("drm/i915/gen12: Flush L3")
As you discovered this is most likely wrong since bspec say bit 30 on
TGL is L3 Fabric Flush.
So more questions than answers and it looks access to internal docs will
be required to get to the bottom of it all.
> Will give some testing here and submit a i915 patch.
I at least tried changing it to bit 30 on ADL but that on it's own
appeared to make no positive improvement to the transient cache dirt in
some AuxCCS IGTs. It still requires the large hammer of "drm/xe: Force
flush system memory AuxCCS framebuffers before scan out".
I am curious to what your testing will show with the use cases you have
in mind.
Regards,
Tvrtko
>>
>>> #define PIPE_CONTROL_AMFS_FLUSH (1<<25)
>>> #define PIPE_CONTROL_GLOBAL_GTT_IVB (1<<24)
>>> #define PIPE_CONTROL_LRI_POST_SYNC BIT(23)
>>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
>>> index a1289f086191..8f655b6fe913 100644
>>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>>> @@ -197,6 +197,16 @@ static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
>>> if (XE_WA(gt, 1409600907))
>>> flags |= PIPE_CONTROL_DEPTH_STALL;
>>>
>>> + /*
>>> + * L3 fabric flush is needed for AUX CCS invalidation
>>> + * which happens as part of pipe-control so we can
>>> + * ignore PIPE_CONTROL_FLUSH_L3. Also PIPE_CONTROL_FLUSH_L3
>>> + * deals with Protected Memory which is not needed for
>>> + * AUX CCS invalidation and lead to unwanted side effects.
>>> + */
>>> + if (GRAPHICS_VERx100(xe) < 1270)
>>> + flags |= PIPE_CONTROL_FLUSH_L3;
>>> +
>>> if (lacks_render)
>>> flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>>> else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
More information about the Intel-xe
mailing list