[PATCH v2] drm/xe: flush gtt before signalling user fence on all engines

Mon Jun 3 07:35:55 UTC 2024

On Thu, 2024-05-30 at 20:45 +0000, Matthew Brost wrote:
> On Thu, May 30, 2024 at 01:17:32PM +0200, Thomas Hellström wrote:
> > Hi, All.
> > 
> > I was looking at this patch for drm-xe-fixes but it doesn't look
> > correct to me.
> > 
> > First, AFAICT, the "emit flush imm ggtt" means that we're flushing
> > outstanding / posted writes, and then write a DW to a ggtt address,
> > so
> > we're not really "flushing gtt"
> > 
> 
> So, is this a bad name? I think I agree. It could have been a
> holdover
> from the i915 names. Maybe we should do a cleanup in xe_ring_ops
> soon?
> 
> Or are you saying that the existing emit_flush_imm_ggtt is not
> sufficient to ensure all writes from batches are visible? If this
> were
> true, I would think we'd have all sorts of problems popping up.

It was more the title of the patch that says "flush gtt" when I think
it should say "flush writes" or something similar.

> 
> > Second, I don't think we have anything left that explicitly flushes
> > the
> > posted write of the user-fence value?
> > 
> 
> I think this might be true. So there could be a case where we get an
> IRQ
> and the user fence value is not yet visible? 

Yes, exactly.

> 
> Not an expert ring programming but are instructions to store a dword
> which make these immediately visible? If so, I think that is what
> should
> be used.

There are various options here, using various variants of MI_FLUSH_DW
and pipe_control, and I'm not sure what would be the most performant
but I think the simplest solution would be to revert the patch and just
emit an additional MI_FLUSH_DW as a write barrier before emitting the
posted userptr value.

> 
> 
> 
> We should also probably check how downstream i915 did this too.
> 
> > and finally the seqno fence now gets flushed before the user-fence.
> > Perhaps that's not a bad thing, though.
> > 
> 
> I don't think this is an issue, I can't think of a case where this
> reordering would create a problem.
> 
> Matt

/Thomas

>  
> > /Thomas
> > 
> > 
> > On Wed, 2024-05-22 at 09:27 +0200, Andrzej Hajda wrote:
> > > Tests show that user fence signalling requires kind of write
> > > barrier,
> > > otherwise not all writes performed by the workload will be
> > > available
> > > to userspace. It is already done for render and compute, we need
> > > it
> > > also for the rest: video, gsc, copy.
> > > 
> > > v2: added gsc and copy engines, added fixes and r-b tags
> > > 
> > > Closes:
> > > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488
> > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for
> > > Intel
> > > GPUs")
> > > Signed-off-by: Andrzej Hajda <andrzej.hajda at intel.com>
> > > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > > Changes in v2:
> > > - Added fixes and r-b tags
> > > - Link to v1:
> > > https://lore.kernel.org/r/20240521-xu_flush_vcs_before_ufence-v1-1-ded38b56c8c9@intel.com
> > > ---
> > > Matthew,
> > > 
> > > I have extended patch to copy and gsc engines. I have kept your
> > > r-b,
> > > since the change is similar, I hope it is OK.
> > > ---
> > >  drivers/gpu/drm/xe/xe_ring_ops.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > index a3ca718456f6..a46a1257a24f 100644
> > > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > @@ -234,13 +234,13 @@ static void __emit_job_gen12_simple(struct
> > > xe_sched_job *job, struct xe_lrc *lrc
> > >  
> > >  	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
> > >  
> > > +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > seqno,
> > > false, dw, i);
> > > +
> > >  	if (job->user_fence.used)
> > >  		i = emit_store_imm_ppgtt_posted(job-
> > > > user_fence.addr,
> > >  						job-
> > > > user_fence.value,
> > >  						dw, i);
> > >  
> > > -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > seqno,
> > > false, dw, i);
> > > -
> > >  	i = emit_user_interrupt(dw, i);
> > >  
> > >  	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
> > > @@ -293,13 +293,13 @@ static void __emit_job_gen12_video(struct
> > > xe_sched_job *job, struct xe_lrc *lrc,
> > >  
> > >  	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
> > >  
> > > +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > seqno,
> > > false, dw, i);
> > > +
> > >  	if (job->user_fence.used)
> > >  		i = emit_store_imm_ppgtt_posted(job-
> > > > user_fence.addr,
> > >  						job-
> > > > user_fence.value,
> > >  						dw, i);
> > >  
> > > -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > seqno,
> > > false, dw, i);
> > > -
> > >  	i = emit_user_interrupt(dw, i);
> > >  
> > >  	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
> > > 
> > > ---
> > > base-commit: 188ced1e0ff892f0948f20480e2e0122380ae46d
> > > change-id: 20240521-xu_flush_vcs_before_ufence-a7b45d94cf33
> > > 
> > > Best regards,
> >