[PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too
Dong, Zhanjun
zhanjun.dong at intel.com
Thu Mar 27 22:39:09 UTC 2025
On 2025-03-20 6:11 a.m., Kenneth Graunke wrote:
> Historically, the Vertex Fetcher unit has not been an L3 client. That
> meant that, when a buffer containing vertex data was written to, it was
> necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
> VF L2 cachelines associated with that buffer, so the new value would be
> properly read from memory.
>
> Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
> have included an "L3 Bypass Enable" bit which userspace drivers can set
> to request that the vertex fetcher unit snoop L3. However, unlike most
> true L3 clients, the "VF Cache Invalidate" bit continues to only
> invalidate the VF L2 cache - and not any associated L3 lines.
>
> To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
> Bit", which according to the docs, "controls the invalidation of the
> Geometry streams cached in L3 cache at the top of the pipe." In other
> words, the vertex and index buffer data that gets cached in L3 when
> "L3 Bypass Disable" is set.
>
> Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
> whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
> Cache Invalidate so that both L2 and L3 vertex data is invalidated.
>
> xe is issuing VF cache invalidates too (which handles cases like CPU
> writes to a buffer between GPU batches). Because userspace may enable
> L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
>
> Fixes significant flickering in Firefox on Meteorlake, which was writing
> to vertex buffers via the CPU between batches; the missing L3 Read Only
> invalidates were causing the vertex fetcher to read stale data from L3.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
> Cc: stable at vger.kernel.org # v6.13+
> ---
> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
> drivers/gpu/drm/xe/xe_ring_ops.c | 13 +++++++++----
> 2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index a255946b6f77e..8cfcd3360896c 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -41,6 +41,7 @@
>
> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>
> +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
> #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
>
> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 0c230ee53bba5..9d8901a33205a 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
> static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> int i)
> {
> - u32 flags = PIPE_CONTROL_CS_STALL |
> + u32 flags0 = 0;
> + u32 flags1 = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
> @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> PIPE_CONTROL_STORE_DATA_INDEX;
>
> if (invalidate_tlb)
> - flags |= PIPE_CONTROL_TLB_INVALIDATE;
> + flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>
> - flags &= ~mask_flags;
> + flags1 &= ~mask_flags;
>
> - return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> + if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
> + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
> +
> + return emit_pipe_control(dw, i, flags0, flags1,
> + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
New PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE defined as spec documented.
New flags0/1 handling looks good to me.
For some reason this patch did not triggers automatic CI run:
Address 'kenneth at whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!
Let me check what we can do. CI run result is required before moving
forward.
> }
>
> static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
More information about the Intel-xe
mailing list