[PATCH v2] drm/xe: flush gtt before signalling user fence on all engines

Mon Jun 3 17:42:06 UTC 2024

On Mon, Jun 03, 2024 at 10:47:32AM +0200, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2024-06-03 at 10:11 +0200, Andrzej Hajda wrote:
> > 
> > 
> > On 03.06.2024 09:35, Thomas Hellström wrote:
> > > On Thu, 2024-05-30 at 20:45 +0000, Matthew Brost wrote:
> > > > On Thu, May 30, 2024 at 01:17:32PM +0200, Thomas Hellström wrote:
> > > > > Hi, All.
> > > > > 
> > > > > I was looking at this patch for drm-xe-fixes but it doesn't
> > > > > look
> > > > > correct to me.
> > > > > 
> > > > > First, AFAICT, the "emit flush imm ggtt" means that we're
> > > > > flushing
> > > > > outstanding / posted writes, and then write a DW to a ggtt
> > > > > address,
> > > > > so
> > > > > we're not really "flushing gtt"
> > > > > 
> > > > So, is this a bad name? I think I agree. It could have been a
> > > > holdover
> > > > from the i915 names. Maybe we should do a cleanup in xe_ring_ops
> > > > soon?
> > > > 
> > > > Or are you saying that the existing emit_flush_imm_ggtt is not
> > > > sufficient to ensure all writes from batches are visible? If this
> > > > were
> > > > true, I would think we'd have all sorts of problems popping up.
> > > It was more the title of the patch that says "flush gtt" when I
> > > think
> > > it should say "flush writes" or something similar.
> > > 
> > > 
> > > > > Second, I don't think we have anything left that explicitly
> > > > > flushes
> > > > > the
> > > > > posted write of the user-fence value?
> > > > > 
> > > > I think this might be true. So there could be a case where we get
> > > > an
> > > > IRQ
> > > > and the user fence value is not yet visible?
> > > Yes, exactly.
> > > 
> > > > Not an expert ring programming but are instructions to store a
> > > > dword
> > > > which make these immediately visible? If so, I think that is what
> > > > should
> > > > be used.
> > > There are various options here, using various variants of
> > > MI_FLUSH_DW
> > > and pipe_control, and I'm not sure what would be the most
> > > performant
> > > but I think the simplest solution would be to revert the patch and
> > > just
> > > emit an additional MI_FLUSH_DW as a write barrier before emitting
> > > the
> > > posted userptr value.
> > 
> > As the patch already landed I have posted fix for it:
> > https://patchwork.freedesktop.org/series/134354/
> > 
> > Regards
> > Andrzej
> 
> I'm still concerned about the userptr write happening after the regular
> seqno write. Let's say the user requests a userptr write to a bo. 
> 

I don't think this is a valid concern. User fences should be limited to
LR VMs. We don't enforce this, but I think we should add that
restriction. It should be okay to do so unless we have UMD using user
fences in a non-LR VM (which we don't).

I'm going to post a patch for this now.

> 1) The LRC fence signals.

In LR mode, the hw seqno / LRC ires unused. We signal LR jobs fences
immediately on upon scheduling.

> 2) Bo is evicted / userptr invalidated. pages returned to system.

In LR mode if a BO is evicted / userptr invalidated either we kick jobs
off the HW via preempt fences or if in faulting we invalidate PPGTT
mapping which causes a recoverable fault.

> 3) The user-fence writes to pages that we no longer own, or causes a
> fault.

Either exceution is resumed via rebind worker with valid mappings or
upon page fault we create valid mappings.

With this, I believe the original patch along wiht upcoming fix [1] is
correct.

Matt

[1] https://patchwork.freedesktop.org/series/134354/

> 
> /Thomas
> 
> > 
> > > 
> > > > 
> > > > 
> > > > We should also probably check how downstream i915 did this too.
> > > > 
> > > > > and finally the seqno fence now gets flushed before the user-
> > > > > fence.
> > > > > Perhaps that's not a bad thing, though.
> > > > > 
> > > > I don't think this is an issue, I can't think of a case where
> > > > this
> > > > reordering would create a problem.
> > > > 
> > > > Matt
> > > /Thomas
> > > 
> > > >   
> > > > > /Thomas
> > > > > 
> > > > > 
> > > > > On Wed, 2024-05-22 at 09:27 +0200, Andrzej Hajda wrote:
> > > > > > Tests show that user fence signalling requires kind of write
> > > > > > barrier,
> > > > > > otherwise not all writes performed by the workload will be
> > > > > > available
> > > > > > to userspace. It is already done for render and compute, we
> > > > > > need
> > > > > > it
> > > > > > also for the rest: video, gsc, copy.
> > > > > > 
> > > > > > v2: added gsc and copy engines, added fixes and r-b tags
> > > > > > 
> > > > > > Closes:
> > > > > > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488
> > > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for
> > > > > > Intel
> > > > > > GPUs")
> > > > > > Signed-off-by: Andrzej Hajda <andrzej.hajda at intel.com>
> > > > > > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> > > > > > ---
> > > > > > Changes in v2:
> > > > > > - Added fixes and r-b tags
> > > > > > - Link to v1:
> > > > > > https://lore.kernel.org/r/20240521-xu_flush_vcs_before_ufence-v1-1-ded38b56c8c9@intel.com
> > > > > > ---
> > > > > > Matthew,
> > > > > > 
> > > > > > I have extended patch to copy and gsc engines. I have kept
> > > > > > your
> > > > > > r-b,
> > > > > > since the change is similar, I hope it is OK.
> > > > > > ---
> > > > > >   drivers/gpu/drm/xe/xe_ring_ops.c | 8 ++++----
> > > > > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > index a3ca718456f6..a46a1257a24f 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > @@ -234,13 +234,13 @@ static void
> > > > > > __emit_job_gen12_simple(struct
> > > > > > xe_sched_job *job, struct xe_lrc *lrc
> > > > > >   
> > > > > >   	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
> > > > > >   
> > > > > > +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > > > > seqno,
> > > > > > false, dw, i);
> > > > > > +
> > > > > >   	if (job->user_fence.used)
> > > > > >   		i = emit_store_imm_ppgtt_posted(job-
> > > > > > > user_fence.addr,
> > > > > >   						job-
> > > > > > > user_fence.value,
> > > > > >   						dw, i);
> > > > > >   
> > > > > > -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > > > > seqno,
> > > > > > false, dw, i);
> > > > > > -
> > > > > >   	i = emit_user_interrupt(dw, i);
> > > > > >   
> > > > > >   	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
> > > > > > @@ -293,13 +293,13 @@ static void
> > > > > > __emit_job_gen12_video(struct
> > > > > > xe_sched_job *job, struct xe_lrc *lrc,
> > > > > >   
> > > > > >   	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
> > > > > >   
> > > > > > +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > > > > seqno,
> > > > > > false, dw, i);
> > > > > > +
> > > > > >   	if (job->user_fence.used)
> > > > > >   		i = emit_store_imm_ppgtt_posted(job-
> > > > > > > user_fence.addr,
> > > > > >   						job-
> > > > > > > user_fence.value,
> > > > > >   						dw, i);
> > > > > >   
> > > > > > -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
> > > > > > seqno,
> > > > > > false, dw, i);
> > > > > > -
> > > > > >   	i = emit_user_interrupt(dw, i);
> > > > > >   
> > > > > >   	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
> > > > > > 
> > > > > > ---
> > > > > > base-commit: 188ced1e0ff892f0948f20480e2e0122380ae46d
> > > > > > change-id: 20240521-xu_flush_vcs_before_ufence-a7b45d94cf33
> > > > > > 
> > > > > > Best regards,
> > 
>