[PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too

Dong, Zhanjun zhanjun.dong at intel.com
Thu Mar 27 22:39:09 UTC 2025


On 2025-03-20 6:11 a.m., Kenneth Graunke wrote:
> Historically, the Vertex Fetcher unit has not been an L3 client.  That
> meant that, when a buffer containing vertex data was written to, it was
> necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
> VF L2 cachelines associated with that buffer, so the new value would be
> properly read from memory.
> 
> Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
> have included an "L3 Bypass Enable" bit which userspace drivers can set
> to request that the vertex fetcher unit snoop L3.  However, unlike most
> true L3 clients, the "VF Cache Invalidate" bit continues to only
> invalidate the VF L2 cache - and not any associated L3 lines.
> 
> To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
> Bit", which according to the docs, "controls the invalidation of the
> Geometry streams cached in L3 cache at the top of the pipe."  In other
> words, the vertex and index buffer data that gets cached in L3 when
> "L3 Bypass Disable" is set.
> 
> Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
> whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
> Cache Invalidate so that both L2 and L3 vertex data is invalidated.
> 
> xe is issuing VF cache invalidates too (which handles cases like CPU
> writes to a buffer between GPU batches).  Because userspace may enable
> L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
> 
> Fixes significant flickering in Firefox on Meteorlake, which was writing
> to vertex buffers via the CPU between batches; the missing L3 Read Only
> invalidates were causing the vertex fetcher to read stale data from L3.
> 
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
> Cc: stable at vger.kernel.org # v6.13+
> ---
>   drivers/gpu/drm/xe/instructions/xe_gpu_commands.h |  1 +
>   drivers/gpu/drm/xe/xe_ring_ops.c                  | 13 +++++++++----
>   2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index a255946b6f77e..8cfcd3360896c 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -41,6 +41,7 @@
>   
>   #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>   
> +#define	  PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE	BIT(10)	/* gen12 */
>   #define	  PIPE_CONTROL0_HDC_PIPELINE_FLUSH		BIT(9)	/* gen12 */
>   
>   #define   PIPE_CONTROL_COMMAND_CACHE_INVALIDATE		(1<<29)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 0c230ee53bba5..9d8901a33205a 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,7 +141,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
>   static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>   				int i)
>   {
> -	u32 flags = PIPE_CONTROL_CS_STALL |
> +	u32 flags0 = 0;
> +	u32 flags1 = PIPE_CONTROL_CS_STALL |
>   		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
>   		PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
>   		PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
> @@ -152,11 +153,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>   		PIPE_CONTROL_STORE_DATA_INDEX;
>   
>   	if (invalidate_tlb)
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> +		flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>   
> -	flags &= ~mask_flags;
> +	flags1 &= ~mask_flags;
>   
> -	return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> +	if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
> +		flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
> +
> +	return emit_pipe_control(dw, i, flags0, flags1,
> +				 LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
New PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE	defined as spec documented.
New flags0/1 handling looks good to me.

For some reason this patch did not triggers automatic CI run:

Address 'kenneth at whitecape.org' is not on the allowlist!
Exception occurred during validation, bailing out!

Let me check what we can do. CI run result is required before moving 
forward.

>   }
>   
>   static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,



More information about the Intel-xe mailing list