[PATCH v3 0/8] AuxCCS handling and render compression modifiers

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Fri Mar 28 16:19:53 UTC 2025


On 27/03/2025 13:25, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 25/03/2025 17:39, Juha-Pekka Heikkilä wrote:
>> First patch that freezes mtl for me is
>>
>> Author: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Date:   Tue Mar 18 16:22:16 2025 +0000
>>
>>      drm/xe: Add ring buffer handling for AuxCCS
>>
>>      Align the ring buffer handling of required AuxCCS flushes and
>>      invalidations with the reference implementation from i915.
>>
>>      Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>>
>> If it's of any use, last messages I saw on dmesg are
>>
>> [  +0,004882] xe 0000:00:02.0: [drm:xe_guc_capture_steered_list_init
>> [xe]] GT0: capture found 48 ext-regs.
>> [  +0,021150] xe 0000:00:02.0: [drm:xe_guc_ads_populate [xe]] GT0: ADS
>> capture alloc size changed from 45056 to 20480
>> [  +0,000765] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
>> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
>> = 0x00000072 [0x39/00]
>> [  +0,005246] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
>> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
>> = 0x80000534 [0x1A/05]
>> [  +0,001414] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: init
>> took 6ms, freq = 2250MHz (req = 2250MHz), before = 2250MHz, status =
>> 0x8002F034, timeouts = 0
>> [  +0,000282] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GT0: GuC CT
>> communication channel enabled
>> [  +0,000973] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: LRC WA rcs0 save-restore batch
>> [  +0,000075] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: REG[0x7004] = 0x08000800
>> [  +0,000070] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: REG[0x7044] = 0x00200020
> 
> It could be the very first job submission.
> 
> I had another round of cross checking things and found two more 
> differences against what i915 does on Meteorlake.
> 
> First one is that Wa 14016712196 is applied before pipe control 
> invalidate and flush. Xe only has it before flush.
> 
> Other is PIPE_CONTROL_CCS_FLUSH is set on flushes.
> 
> I cannot test it but if you could, the patch below at least compiles.

Hm I think I simply miscalculated MAX_JOB_SIZE_DW. I bumped it by 10, 
but should have by 16 for MTL. I add two pipe controls (one implicit for 
Wa_14016712196) and semaphore wait in the invalidation phase.

So "#define MAX_JOB_SIZE_DW 64" - if you would have time to test with 
that it would be very helpful.

Regards,

Tvrtko

> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/ 
> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index 93e4687feb71..38d723e47a04 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -42,6 +42,7 @@
>   #define GFX_OP_PIPE_CONTROL(len)    ((0x3<<29)|(0x3<<27)|(0x2<<24)| 
> ((len)-2))
> 
>   #define      PIPE_CONTROL0_HDC_PIPELINE_FLUSH        BIT(9)    /* 
> gen12 */
> +#define   PIPE_CONTROL0_CCS_FLUSH                       BIT(13) /* MTL+ */
> 
>   #define   PIPE_CONTROL_COMMAND_CACHE_INVALIDATE        (1<<29)
>   #define   PIPE_CONTROL_TILE_CACHE_FLUSH            (1<<28)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/ 
> xe_ring_ops.c
> index a380964f3166..02b09826f831 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,8 +141,9 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, 
> u32 bit_group_1, u32 offset,
>       return i;
>   }
> 
> -static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, 
> u32 *dw,
> -                int i)
> +static int
> +emit_pipe_invalidate(struct xe_gt *gt, u32 mask_flags, bool 
> invalidate_tlb,
> +             u32 *dw, int i)
>   {
>       u32 flags = PIPE_CONTROL_CS_STALL |
>           PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> @@ -159,6 +160,10 @@ static int emit_pipe_invalidate(u32 mask_flags, 
> bool invalidate_tlb, u32 *dw,
> 
>       flags &= ~mask_flags;
> 
> +    if (XE_WA(gt, 14016712196))
> +        i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> +                      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> +
>       return emit_pipe_control(dw, i, 0, flags, 
> LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>   }
> 
> @@ -180,12 +185,16 @@ static int emit_render_cache_flush(struct 
> xe_sched_job *job, bool flush_l3,
>       struct xe_gt *gt = job->q->gt;
>       struct xe_device *xe = gt_to_xe(gt);
>       bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> +    u32 bit_group_0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>       u32 flags;
> 
>       if (XE_WA(gt, 14016712196))
>           i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
>                         LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> 
> +    if (GRAPHICS_VERx100(xe) >= 1270)
> +        bit_group_0 |= PIPE_CONTROL0_CCS_FLUSH;
> +
>       flags = (PIPE_CONTROL_CS_STALL |
>            PIPE_CONTROL_TILE_CACHE_FLUSH |
>            PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> @@ -211,7 +220,7 @@ static int emit_render_cache_flush(struct 
> xe_sched_job *job, bool flush_l3,
>       else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
>           flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> 
> -    return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, 
> flags, 0, 0);
> +    return emit_pipe_control(dw, i, bit_group_0, flags, 0, 0);
>   }
> 
>   static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 
> *dw, int i)
> @@ -363,7 +372,7 @@ static void __emit_job_gen12_render_compute(struct 
> xe_sched_job *job,
>           mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> 
>       /* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
> -    i = emit_pipe_invalidate(mask_flags, invalidate_tlb, dw, i);
> +    i = emit_pipe_invalidate(gt, mask_flags, invalidate_tlb, dw, i);
> 
>       /* hsdes: 1809175790 */
>       if (aux_ccs)
> 
> 
>>
>> /Juha-Pekka
>>
>> On Thu, Mar 20, 2025 at 7:11 PM Juha-Pekka Heikkilä
>> <juhapekka.heikkila at gmail.com> wrote:
>>>
>>> I'll try to find some moment to do bisecting, probably will be next 
>>> week when I get to do this.
>>>
>>> /Juha-Pekka
>>>
>>> to 20. maalisk. 2025 klo 10.25 Tvrtko Ursulin 
>>> <tvrtko.ursulin at igalia.com> kirjoitti:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On 19/03/2025 13:41, Juha-Pekka Heikkilä wrote:
>>>>> Hi Tvrtko,
>>>>>
>>>>> I did quick run with these patches. With these changes on top of
>>>>> today's drm-tip I got a complete system freeze on mtl and its variants
>>>>> when do modprobe. I had kgdb enabled but I wasn't even thrown there,
>>>>> the machine went completely unresponsive. On 3/3 tries modprobe xe
>>>>> always completely froze the box.
>>>>
>>>> I don't have MTL to try and neither apparently does CI, which otherwise
>>>> seems happy, as is my ADL-P laptop.
>>>>
>>>> Would you have time to bisect? Or maybe netconsole to see what 
>>>> explodes?
>>>>
>>>> Not much comes to mind looking at the patches.. Maybe something runs to
>>>> early before something else is initialised. Guessing only.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> On Tue, Mar 18, 2025 at 6:22 PM Tvrtko Ursulin
>>>>> <tvrtko.ursulin at igalia.com> wrote:
>>>>>>
>>>>>> A series to fix and add xe support for AuxCSS framebuffers via DPT.
>>>>>>
>>>>>> Currently the auxiliary buffer data isn't mapped into the page 
>>>>>> tables at all so
>>>>>> cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if 
>>>>>> built for Xe")
>>>>>> had to disable the support.
>>>>>>
>>>>>> On top of that there are missing flushes and invalidations both 
>>>>>> from the ring
>>>>>> buffer side and from the CPU side.
>>>>>>
>>>>>> Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
>>>>>>
>>>>>>     [PLANE:32:plane 1A]: type=PRI
>>>>>>             uapi: [FB:242] AR30 little-endian 
>>>>>> (0x30335241),0x100000000000008,2880x1800, visible=visible, 
>>>>>> src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, 
>>>>>> rotation=0 (0x00000001)
>>>>>>             hw: [FB:242] AR30 little-endian 
>>>>>> (0x30335241),0x100000000000008,2880x1800, visible=yes, 
>>>>>> src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, 
>>>>>> rotation=0 (0x00000001)
>>>>>>
>>>>>> Display seems working fine - no artefacts, no DMAR/PIPE faults. CI 
>>>>>> also appears
>>>>>> to be happy with v2.
>>>>>>
>>>>>> v2:
>>>>>>    * More patches added to fix kms_flip_tiling.
>>>>>>
>>>>>> v3:
>>>>>>    * Rebased after some cleanup patches from v2 were merged.
>>>>>>    * Added people to Cc as suggested by Rodrigo.
>>>>>>    * Adjusted last patch title. (Rodrigo)
>>>>>>    * Apply GGTT flushing only to iomapped system memory buffers.
>>>>>>
>>>>>> Cc: José Roberto de Souza <jose.souza at intel.com>
>>>>>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>>>>> Cc: Michael J. Ruhl <michael.j.ruhl at intel.com>
>>>>>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>>>>
>>>>>> Tvrtko Ursulin (8):
>>>>>>     drm/xe: Add ring buffer handling for AuxCCS
>>>>>>     drm/xe: Use fb cached min alignment
>>>>>>     drm/xe: Reduce DPT table alignment as in i915
>>>>>>     drm/xe: Flush GGTT writes after populating DPT
>>>>>>     drm/xe: Handle DPT in system memory
>>>>>>     drm/xe: Force flush system memory AuxCCS framebuffers before 
>>>>>> scan out
>>>>>>     drm/xe/display: Add support for AuxCCS
>>>>>>     drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe
>>>>>>
>>>>>>    .../drm/i915/display/skl_universal_plane.c    |   6 -
>>>>>>    drivers/gpu/drm/xe/display/xe_fb_pin.c        | 181 +++++++++++ 
>>>>>> +++----
>>>>>>    .../gpu/drm/xe/instructions/xe_gpu_commands.h |   1 +
>>>>>>    .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>>>>>>    drivers/gpu/drm/xe/regs/xe_gt_regs.h          |   1 +
>>>>>>    drivers/gpu/drm/xe/xe_bo_types.h              |  14 +-
>>>>>>    drivers/gpu/drm/xe/xe_ring_ops.c              | 173 ++++++++ 
>>>>>> +--------
>>>>>>    drivers/gpu/drm/xe/xe_ring_ops_types.h        |   2 +-
>>>>>>    8 files changed, 261 insertions(+), 123 deletions(-)
>>>>>>
>>>>>> -- 
>>>>>> 2.48.0
>>>>>>
>>>>
> 



More information about the Intel-xe mailing list