[PATCH v3 0/8] AuxCCS handling and render compression modifiers

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Thu Mar 27 13:25:32 UTC 2025


Hi,

On 25/03/2025 17:39, Juha-Pekka Heikkilä wrote:
> First patch that freezes mtl for me is
> 
> Author: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Date:   Tue Mar 18 16:22:16 2025 +0000
> 
>      drm/xe: Add ring buffer handling for AuxCCS
> 
>      Align the ring buffer handling of required AuxCCS flushes and
>      invalidations with the reference implementation from i915.
> 
>      Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> 
> 
> If it's of any use, last messages I saw on dmesg are
> 
> [  +0,004882] xe 0000:00:02.0: [drm:xe_guc_capture_steered_list_init
> [xe]] GT0: capture found 48 ext-regs.
> [  +0,021150] xe 0000:00:02.0: [drm:xe_guc_ads_populate [xe]] GT0: ADS
> capture alloc size changed from 45056 to 20480
> [  +0,000765] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
> = 0x00000072 [0x39/00]
> [  +0,005246] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
> = 0x80000534 [0x1A/05]
> [  +0,001414] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: init
> took 6ms, freq = 2250MHz (req = 2250MHz), before = 2250MHz, status =
> 0x8002F034, timeouts = 0
> [  +0,000282] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GT0: GuC CT
> communication channel enabled
> [  +0,000973] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
> GT0: LRC WA rcs0 save-restore batch
> [  +0,000075] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
> GT0: REG[0x7004] = 0x08000800
> [  +0,000070] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
> GT0: REG[0x7044] = 0x00200020

It could be the very first job submission.

I had another round of cross checking things and found two more 
differences against what i915 does on Meteorlake.

First one is that Wa 14016712196 is applied before pipe control 
invalidate and flush. Xe only has it before flush.

Other is PIPE_CONTROL_CCS_FLUSH is set on flushes.

I cannot test it but if you could, the patch below at least compiles.

Regards,

Tvrtko

diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h 
b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index 93e4687feb71..38d723e47a04 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -42,6 +42,7 @@
  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))

  #define	  PIPE_CONTROL0_HDC_PIPELINE_FLUSH		BIT(9)	/* gen12 */
+#define   PIPE_CONTROL0_CCS_FLUSH                       BIT(13) /* MTL+ */

  #define   PIPE_CONTROL_COMMAND_CACHE_INVALIDATE		(1<<29)
  #define   PIPE_CONTROL_TILE_CACHE_FLUSH			(1<<28)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c 
b/drivers/gpu/drm/xe/xe_ring_ops.c
index a380964f3166..02b09826f831 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -141,8 +141,9 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, 
u32 bit_group_1, u32 offset,
  	return i;
  }

-static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, 
u32 *dw,
-				int i)
+static int
+emit_pipe_invalidate(struct xe_gt *gt, u32 mask_flags, bool invalidate_tlb,
+		     u32 *dw, int i)
  {
  	u32 flags = PIPE_CONTROL_CS_STALL |
  		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
@@ -159,6 +160,10 @@ static int emit_pipe_invalidate(u32 mask_flags, 
bool invalidate_tlb, u32 *dw,

  	flags &= ~mask_flags;

+	if (XE_WA(gt, 14016712196))
+		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
+				      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
+
  	return emit_pipe_control(dw, i, 0, flags, 
LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
  }

@@ -180,12 +185,16 @@ static int emit_render_cache_flush(struct 
xe_sched_job *job, bool flush_l3,
  	struct xe_gt *gt = job->q->gt;
  	struct xe_device *xe = gt_to_xe(gt);
  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
+	u32 bit_group_0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
  	u32 flags;

  	if (XE_WA(gt, 14016712196))
  		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
  				      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);

+	if (GRAPHICS_VERx100(xe) >= 1270)
+		bit_group_0 |= PIPE_CONTROL0_CCS_FLUSH;
+
  	flags = (PIPE_CONTROL_CS_STALL |
  		 PIPE_CONTROL_TILE_CACHE_FLUSH |
  		 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
@@ -211,7 +220,7 @@ static int emit_render_cache_flush(struct 
xe_sched_job *job, bool flush_l3,
  	else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
  		flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;

-	return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, 
flags, 0, 0);
+	return emit_pipe_control(dw, i, bit_group_0, flags, 0, 0);
  }

  static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 
*dw, int i)
@@ -363,7 +372,7 @@ static void __emit_job_gen12_render_compute(struct 
xe_sched_job *job,
  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;

  	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, invalidate_tlb, dw, i);
+	i = emit_pipe_invalidate(gt, mask_flags, invalidate_tlb, dw, i);

  	/* hsdes: 1809175790 */
  	if (aux_ccs)


> 
> /Juha-Pekka
> 
> On Thu, Mar 20, 2025 at 7:11 PM Juha-Pekka Heikkilä
> <juhapekka.heikkila at gmail.com> wrote:
>>
>> I'll try to find some moment to do bisecting, probably will be next week when I get to do this.
>>
>> /Juha-Pekka
>>
>> to 20. maalisk. 2025 klo 10.25 Tvrtko Ursulin <tvrtko.ursulin at igalia.com> kirjoitti:
>>>
>>>
>>> Hi,
>>>
>>> On 19/03/2025 13:41, Juha-Pekka Heikkilä wrote:
>>>> Hi Tvrtko,
>>>>
>>>> I did quick run with these patches. With these changes on top of
>>>> today's drm-tip I got a complete system freeze on mtl and its variants
>>>> when do modprobe. I had kgdb enabled but I wasn't even thrown there,
>>>> the machine went completely unresponsive. On 3/3 tries modprobe xe
>>>> always completely froze the box.
>>>
>>> I don't have MTL to try and neither apparently does CI, which otherwise
>>> seems happy, as is my ADL-P laptop.
>>>
>>> Would you have time to bisect? Or maybe netconsole to see what explodes?
>>>
>>> Not much comes to mind looking at the patches.. Maybe something runs to
>>> early before something else is initialised. Guessing only.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> On Tue, Mar 18, 2025 at 6:22 PM Tvrtko Ursulin
>>>> <tvrtko.ursulin at igalia.com> wrote:
>>>>>
>>>>> A series to fix and add xe support for AuxCSS framebuffers via DPT.
>>>>>
>>>>> Currently the auxiliary buffer data isn't mapped into the page tables at all so
>>>>> cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
>>>>> had to disable the support.
>>>>>
>>>>> On top of that there are missing flushes and invalidations both from the ring
>>>>> buffer side and from the CPU side.
>>>>>
>>>>> Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
>>>>>
>>>>>     [PLANE:32:plane 1A]: type=PRI
>>>>>             uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)
>>>>>             hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)
>>>>>
>>>>> Display seems working fine - no artefacts, no DMAR/PIPE faults. CI also appears
>>>>> to be happy with v2.
>>>>>
>>>>> v2:
>>>>>    * More patches added to fix kms_flip_tiling.
>>>>>
>>>>> v3:
>>>>>    * Rebased after some cleanup patches from v2 were merged.
>>>>>    * Added people to Cc as suggested by Rodrigo.
>>>>>    * Adjusted last patch title. (Rodrigo)
>>>>>    * Apply GGTT flushing only to iomapped system memory buffers.
>>>>>
>>>>> Cc: José Roberto de Souza <jose.souza at intel.com>
>>>>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>>>> Cc: Michael J. Ruhl <michael.j.ruhl at intel.com>
>>>>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>>>
>>>>> Tvrtko Ursulin (8):
>>>>>     drm/xe: Add ring buffer handling for AuxCCS
>>>>>     drm/xe: Use fb cached min alignment
>>>>>     drm/xe: Reduce DPT table alignment as in i915
>>>>>     drm/xe: Flush GGTT writes after populating DPT
>>>>>     drm/xe: Handle DPT in system memory
>>>>>     drm/xe: Force flush system memory AuxCCS framebuffers before scan out
>>>>>     drm/xe/display: Add support for AuxCCS
>>>>>     drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe
>>>>>
>>>>>    .../drm/i915/display/skl_universal_plane.c    |   6 -
>>>>>    drivers/gpu/drm/xe/display/xe_fb_pin.c        | 181 ++++++++++++++----
>>>>>    .../gpu/drm/xe/instructions/xe_gpu_commands.h |   1 +
>>>>>    .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>>>>>    drivers/gpu/drm/xe/regs/xe_gt_regs.h          |   1 +
>>>>>    drivers/gpu/drm/xe/xe_bo_types.h              |  14 +-
>>>>>    drivers/gpu/drm/xe/xe_ring_ops.c              | 173 +++++++++--------
>>>>>    drivers/gpu/drm/xe/xe_ring_ops_types.h        |   2 +-
>>>>>    8 files changed, 261 insertions(+), 123 deletions(-)
>>>>>
>>>>> --
>>>>> 2.48.0
>>>>>
>>>



More information about the Intel-xe mailing list