[PATCH v3 0/8] AuxCCS handling and render compression modifiers
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Fri Mar 28 16:19:53 UTC 2025
On 27/03/2025 13:25, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 25/03/2025 17:39, Juha-Pekka Heikkilä wrote:
>> First patch that freezes mtl for me is
>>
>> Author: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Date: Tue Mar 18 16:22:16 2025 +0000
>>
>> drm/xe: Add ring buffer handling for AuxCCS
>>
>> Align the ring buffer handling of required AuxCCS flushes and
>> invalidations with the reference implementation from i915.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>
>>
>> If it's of any use, last messages I saw on dmesg are
>>
>> [ +0,004882] xe 0000:00:02.0: [drm:xe_guc_capture_steered_list_init
>> [xe]] GT0: capture found 48 ext-regs.
>> [ +0,021150] xe 0000:00:02.0: [drm:xe_guc_ads_populate [xe]] GT0: ADS
>> capture alloc size changed from 45056 to 20480
>> [ +0,000765] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
>> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
>> = 0x00000072 [0x39/00]
>> [ +0,005246] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: load
>> still in progress, timeouts = 0, freq = 2250MHz (req 2250MHz), status
>> = 0x80000534 [0x1A/05]
>> [ +0,001414] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GT0: init
>> took 6ms, freq = 2250MHz (req = 2250MHz), before = 2250MHz, status =
>> 0x8002F034, timeouts = 0
>> [ +0,000282] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GT0: GuC CT
>> communication channel enabled
>> [ +0,000973] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: LRC WA rcs0 save-restore batch
>> [ +0,000075] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: REG[0x7004] = 0x08000800
>> [ +0,000070] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]]
>> GT0: REG[0x7044] = 0x00200020
>
> It could be the very first job submission.
>
> I had another round of cross checking things and found two more
> differences against what i915 does on Meteorlake.
>
> First one is that Wa 14016712196 is applied before pipe control
> invalidate and flush. Xe only has it before flush.
>
> Other is PIPE_CONTROL_CCS_FLUSH is set on flushes.
>
> I cannot test it but if you could, the patch below at least compiles.
Hm I think I simply miscalculated MAX_JOB_SIZE_DW. I bumped it by 10,
but should have by 16 for MTL. I add two pipe controls (one implicit for
Wa_14016712196) and semaphore wait in the invalidation phase.
So "#define MAX_JOB_SIZE_DW 64" - if you would have time to test with
that it would be very helpful.
Regards,
Tvrtko
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/
> drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index 93e4687feb71..38d723e47a04 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -42,6 +42,7 @@
> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|
> ((len)-2))
>
> #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /*
> gen12 */
> +#define PIPE_CONTROL0_CCS_FLUSH BIT(13) /* MTL+ */
>
> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/
> xe_ring_ops.c
> index a380964f3166..02b09826f831 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -141,8 +141,9 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0,
> u32 bit_group_1, u32 offset,
> return i;
> }
>
> -static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb,
> u32 *dw,
> - int i)
> +static int
> +emit_pipe_invalidate(struct xe_gt *gt, u32 mask_flags, bool
> invalidate_tlb,
> + u32 *dw, int i)
> {
> u32 flags = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> @@ -159,6 +160,10 @@ static int emit_pipe_invalidate(u32 mask_flags,
> bool invalidate_tlb, u32 *dw,
>
> flags &= ~mask_flags;
>
> + if (XE_WA(gt, 14016712196))
> + i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> +
> return emit_pipe_control(dw, i, 0, flags,
> LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
> }
>
> @@ -180,12 +185,16 @@ static int emit_render_cache_flush(struct
> xe_sched_job *job, bool flush_l3,
> struct xe_gt *gt = job->q->gt;
> struct xe_device *xe = gt_to_xe(gt);
> bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> + u32 bit_group_0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
> u32 flags;
>
> if (XE_WA(gt, 14016712196))
> i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>
> + if (GRAPHICS_VERx100(xe) >= 1270)
> + bit_group_0 |= PIPE_CONTROL0_CCS_FLUSH;
> +
> flags = (PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_TILE_CACHE_FLUSH |
> PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> @@ -211,7 +220,7 @@ static int emit_render_cache_flush(struct
> xe_sched_job *job, bool flush_l3,
> else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
> flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>
> - return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH,
> flags, 0, 0);
> + return emit_pipe_control(dw, i, bit_group_0, flags, 0, 0);
> }
>
> static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32
> *dw, int i)
> @@ -363,7 +372,7 @@ static void __emit_job_gen12_render_compute(struct
> xe_sched_job *job,
> mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
>
> /* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
> - i = emit_pipe_invalidate(mask_flags, invalidate_tlb, dw, i);
> + i = emit_pipe_invalidate(gt, mask_flags, invalidate_tlb, dw, i);
>
> /* hsdes: 1809175790 */
> if (aux_ccs)
>
>
>>
>> /Juha-Pekka
>>
>> On Thu, Mar 20, 2025 at 7:11 PM Juha-Pekka Heikkilä
>> <juhapekka.heikkila at gmail.com> wrote:
>>>
>>> I'll try to find some moment to do bisecting, probably will be next
>>> week when I get to do this.
>>>
>>> /Juha-Pekka
>>>
>>> to 20. maalisk. 2025 klo 10.25 Tvrtko Ursulin
>>> <tvrtko.ursulin at igalia.com> kirjoitti:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On 19/03/2025 13:41, Juha-Pekka Heikkilä wrote:
>>>>> Hi Tvrtko,
>>>>>
>>>>> I did quick run with these patches. With these changes on top of
>>>>> today's drm-tip I got a complete system freeze on mtl and its variants
>>>>> when do modprobe. I had kgdb enabled but I wasn't even thrown there,
>>>>> the machine went completely unresponsive. On 3/3 tries modprobe xe
>>>>> always completely froze the box.
>>>>
>>>> I don't have MTL to try and neither apparently does CI, which otherwise
>>>> seems happy, as is my ADL-P laptop.
>>>>
>>>> Would you have time to bisect? Or maybe netconsole to see what
>>>> explodes?
>>>>
>>>> Not much comes to mind looking at the patches.. Maybe something runs to
>>>> early before something else is initialised. Guessing only.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> On Tue, Mar 18, 2025 at 6:22 PM Tvrtko Ursulin
>>>>> <tvrtko.ursulin at igalia.com> wrote:
>>>>>>
>>>>>> A series to fix and add xe support for AuxCSS framebuffers via DPT.
>>>>>>
>>>>>> Currently the auxiliary buffer data isn't mapped into the page
>>>>>> tables at all so
>>>>>> cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if
>>>>>> built for Xe")
>>>>>> had to disable the support.
>>>>>>
>>>>>> On top of that there are missing flushes and invalidations both
>>>>>> from the ring
>>>>>> buffer side and from the CPU side.
>>>>>>
>>>>>> Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
>>>>>>
>>>>>> [PLANE:32:plane 1A]: type=PRI
>>>>>> uapi: [FB:242] AR30 little-endian
>>>>>> (0x30335241),0x100000000000008,2880x1800, visible=visible,
>>>>>> src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0,
>>>>>> rotation=0 (0x00000001)
>>>>>> hw: [FB:242] AR30 little-endian
>>>>>> (0x30335241),0x100000000000008,2880x1800, visible=yes,
>>>>>> src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0,
>>>>>> rotation=0 (0x00000001)
>>>>>>
>>>>>> Display seems working fine - no artefacts, no DMAR/PIPE faults. CI
>>>>>> also appears
>>>>>> to be happy with v2.
>>>>>>
>>>>>> v2:
>>>>>> * More patches added to fix kms_flip_tiling.
>>>>>>
>>>>>> v3:
>>>>>> * Rebased after some cleanup patches from v2 were merged.
>>>>>> * Added people to Cc as suggested by Rodrigo.
>>>>>> * Adjusted last patch title. (Rodrigo)
>>>>>> * Apply GGTT flushing only to iomapped system memory buffers.
>>>>>>
>>>>>> Cc: José Roberto de Souza <jose.souza at intel.com>
>>>>>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>>>>> Cc: Michael J. Ruhl <michael.j.ruhl at intel.com>
>>>>>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>>>>
>>>>>> Tvrtko Ursulin (8):
>>>>>> drm/xe: Add ring buffer handling for AuxCCS
>>>>>> drm/xe: Use fb cached min alignment
>>>>>> drm/xe: Reduce DPT table alignment as in i915
>>>>>> drm/xe: Flush GGTT writes after populating DPT
>>>>>> drm/xe: Handle DPT in system memory
>>>>>> drm/xe: Force flush system memory AuxCCS framebuffers before
>>>>>> scan out
>>>>>> drm/xe/display: Add support for AuxCCS
>>>>>> drm/i915/display: Expose AuxCCS frame buffer modifiers for Xe
>>>>>>
>>>>>> .../drm/i915/display/skl_universal_plane.c | 6 -
>>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 181 +++++++++++
>>>>>> +++----
>>>>>> .../gpu/drm/xe/instructions/xe_gpu_commands.h | 1 +
>>>>>> .../gpu/drm/xe/instructions/xe_mi_commands.h | 6 +
>>>>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 1 +
>>>>>> drivers/gpu/drm/xe/xe_bo_types.h | 14 +-
>>>>>> drivers/gpu/drm/xe/xe_ring_ops.c | 173 ++++++++
>>>>>> +--------
>>>>>> drivers/gpu/drm/xe/xe_ring_ops_types.h | 2 +-
>>>>>> 8 files changed, 261 insertions(+), 123 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.48.0
>>>>>>
>>>>
>
More information about the Intel-xe
mailing list