[Intel-xe] [PATCH 1/6] drm/xe: Make MI_FLUSH_DW immediate size more explicit

Thu Oct 12 19:49:58 UTC 2023

On Wed, Oct 11, 2023 at 04:09:59PM -0700, Matt Roper wrote:
>Despite its name, MI_FLUSH_DW instruction can write an immediate value
>of either dword size or qword size, depending on the 'length' field of
>the instruction.  Since "length" excludes the first two dwords of the
>instruction, a value of 2 in the length field implies a dword write and
>a value of 3 implies a qword write.  Even in cases where the flush
>instruction's post-sync operation is set to "no write" we're still
>expected to size the overall instruction as if we were doing a dword or
>qword write (i.e., a length of 1 shouldn't be used on modern platforms).
>
>Rather than baking a size of "1" into the #define and then adding
>another unexplained "+ 1" at all the spots where the definition gets
>used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
>that should be OR'd into the instruction header to make it more explicit
>what behavior we're requesting.
>
>Bspec: 60229
>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
>---
> drivers/gpu/drm/xe/regs/xe_gpu_commands.h |  5 ++++-
> drivers/gpu/drm/xe/xe_ring_ops.c          | 10 +++++-----
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>index 21738281bdd0..9432a960346b 100644
>--- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>+++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>@@ -30,12 +30,15 @@
> #define   MI_LRI_MMIO_REMAP_EN		REG_BIT(17)
> #define   MI_LRI_FORCE_POSTED		(1<<12)
>
>-#define MI_FLUSH_DW		MI_INSTR(0x26, 1)
>+#define MI_FLUSH_DW		MI_INSTR(0x26, 0)
> #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
> #define   MI_INVALIDATE_TLB		(1<<18)
> #define   MI_FLUSH_DW_CCS		(1<<16)
> #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
> #define   MI_FLUSH_DW_USE_GTT		(1<<2)
>+#define   MI_FLUSH_LENGTH		GENMASK(5, 0)
>+#define   MI_FLUSH_IMM_DW		REG_FIELD_PREP(MI_FLUSH_LENGTH, 2)
>+#define   MI_FLUSH_IMM_QW		REG_FIELD_PREP(MI_FLUSH_LENGTH, 3)
>
> #define MI_BATCH_BUFFER_START		MI_INSTR(0x31, 1)
>
>diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
>index 6eec7c7e4bc5..b95cc7713ff9 100644
>--- a/drivers/gpu/drm/xe/xe_ring_ops.c
>+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>@@ -80,7 +80,7 @@ static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int i)
> static int emit_flush_imm_ggtt(u32 addr, u32 value, bool invalidate_tlb,
> 			       u32 *dw, int i)
> {
>-	dw[i++] = (MI_FLUSH_DW + 1) | MI_FLUSH_DW_OP_STOREDW |
>+	dw[i++] = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW |
> 		(invalidate_tlb ? MI_INVALIDATE_TLB : 0);
> 	dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
> 	dw[i++] = 0;
>@@ -100,9 +100,9 @@ static int emit_bb_start(u64 batch_addr, u32 ppgtt_flag, u32 *dw, int i)
>
> static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
> {
>-	dw[i] = MI_FLUSH_DW + 1;
>+	dw[i] = MI_FLUSH_DW;
> 	dw[i] |= flag;
>-	dw[i++] |= MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>+	dw[i++] |= MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW |

I don't understand the logic/convention you used to add the MI_FLUSH_IMM_DW here
rather than a couple lines above or as the last item to account for the
bit position.

Anyway, it seems correct.

Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>

Lucas De Marchi

> 		MI_FLUSH_DW_STORE_INDEX;
>
> 	dw[i++] = LRC_PPHWSP_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
>@@ -365,8 +365,8 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
>
> 	i = emit_bb_start(job->batch_addr[1], BIT(8), dw, i);
>
>-	dw[i++] = (MI_FLUSH_DW | MI_INVALIDATE_TLB | job->migrate_flush_flags |
>-		   MI_FLUSH_DW_OP_STOREDW) + 1;
>+	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | job->migrate_flush_flags |
>+		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW;
> 	dw[i++] = xe_lrc_seqno_ggtt_addr(lrc) | MI_FLUSH_DW_USE_GTT;
> 	dw[i++] = 0;
> 	dw[i++] = seqno; /* value */
>-- 
>2.41.0
>