[PATCH v2] drm/xe/xe2: Use XE_CACHE_WB pat index

Thu Jan 18 15:58:56 UTC 2024

On Wed, 2024-01-03 at 10:45 -0800, Matt Roper wrote:
> On Tue, Jan 02, 2024 at 07:17:22PM -0800, Ghimiray, Himal Prasad
> wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Roper, Matthew D <matthew.d.roper at intel.com>
> > > Sent: 03 January 2024 06:46
> > > To: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org
> > > Subject: Re: [PATCH v2] drm/xe/xe2: Use XE_CACHE_WB pat index
> > > 
> > > On Tue, Jan 02, 2024 at 10:21:42PM +0530, Himal Prasad Ghimiray
> > > wrote:
> > > > Use XE_CACHE_WB pat index by default on xe2 platforms and use
> > > > XE_CACHE_NONE_COMPRESSION pat index only if buffer supports
> > > > compression and indirect access of flat ccs.
> > > > 
> > > > Fixes the broken "xe_migrate_test kunit test" on LNL.
> > > 
> > > How/why does it fix it?  Explaining why this is the right thing
> > > to do is more
> > > important than just stating that a test changes from red to
> > > green.
> > > Does the WB vs UC even matter here or is it actually the
> > > coherency with the
> > > CPU cache that matters (meaning UC+1way would also work, but
> > > WB+NonCoh would fail)?
> > 
> > AFAIU the coherency with CPU cache is main thing that matters here.
> > It shouldn't be really related to 
> > WB vs UC.  The pat table entry associated with XE_CACHE_WB is
> > coherent whereas XE_CACHE_NONE is 
> > NonCoherent.  UC+1way would also work, but WB+NonCoh would fail , I
> > had tested using pat index as 0 and 5 respectively.
> 
> If non-coherent is the problem, then how does the compressed case
> work,
> where we always get a non-coherent PAT?  Is the actual problem just
> that
> we're hardcoding a setting in general?  Don't we already have a PAT
> index (selected by userspace at bind time) that we should be using? 
> Or
> am I misunderstanding what the PTE's we're emitting here are intended
> for?

For this particular code, my understanding is that the flat CCS
metadata that is accessed is never accessed by nor mappable by the CPU,
so the coherency issue doesn't really apply.

For non-compressed buffer objects, where the selftest breaks, the kunit
assumes coherency. But the migrate code would typically not access the
buffer content itself on igfx, except for page-tables which are assumed
to be coherent, (at least for now).

/Thomas

> 
> 
> Matt
> 
> > 
> > BR
> > Himal
> > 
> > > 
> > > 
> > > Matt
> > > 
> > > > 
> > > > v2
> > > > - Fix the argument to emit_pte, pass the bool directly.
> > > > (Thomas)
> > > > 
> > > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > Signed-off-by: Himal Prasad Ghimiray
> > > > <himal.prasad.ghimiray at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_migrate.c | 14 +++++++-------
> > > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > b/drivers/gpu/drm/xe/xe_migrate.c index
> > > > adf1dab5eba2..e1a849d74826
> > > > 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > @@ -439,7 +439,7 @@ static void emit_pte(struct xe_migrate *m,
> > > >  	/* Indirect access needs compression enabled uncached
> > > > PAT index */
> > > >  	if (GRAPHICS_VERx100(xe) >= 2000)
> > > >  		pat_index = is_comp_pte ? xe-
> > > > pat.idx[XE_CACHE_NONE_COMPRESSION] :
> > > > -					  xe-
> > > > >pat.idx[XE_CACHE_NONE];
> > > > +					  xe-
> > > > >pat.idx[XE_CACHE_WB];
> > > >  	else
> > > >  		pat_index = xe->pat.idx[XE_CACHE_WB];
> > > > 
> > > > @@ -729,14 +729,14 @@ struct dma_fence *xe_migrate_copy(struct
> > > xe_migrate *m,
> > > >  		}
> > > > 
> > > >  		if (!src_is_vram)
> > > > -			emit_pte(m, bb, src_L0_pt,
> > > > src_is_vram, true,
> > > &src_it, src_L0,
> > > > -				 src_bo);
> > > > +			emit_pte(m, bb, src_L0_pt,
> > > > src_is_vram,
> > > copy_system_ccs,
> > > > +				 &src_it, src_L0, src_bo);
> > > >  		else
> > > >  			xe_res_next(&src_it, src_L0);
> > > > 
> > > >  		if (!dst_is_vram)
> > > > -			emit_pte(m, bb, dst_L0_pt,
> > > > dst_is_vram, true,
> > > &dst_it, src_L0,
> > > > -				 dst_bo);
> > > > +			emit_pte(m, bb, dst_L0_pt,
> > > > dst_is_vram,
> > > copy_system_ccs,
> > > > +				 &dst_it, src_L0, dst_bo);
> > > >  		else
> > > >  			xe_res_next(&dst_it, src_L0);
> > > > 
> > > > @@ -978,8 +978,8 @@ struct dma_fence *xe_migrate_clear(struct
> > > xe_migrate *m,
> > > >  		size -= clear_L0;
> > > >  		/* Preemption is enabled again by the ring
> > > > ops. */
> > > >  		if (!clear_vram) {
> > > > -			emit_pte(m, bb, clear_L0_pt,
> > > > clear_vram, true,
> > > &src_it, clear_L0,
> > > > -				 bo);
> > > > +			emit_pte(m, bb, clear_L0_pt,
> > > > clear_vram,
> > > clear_system_ccs,
> > > > +				 &src_it, clear_L0, bo);
> > > >  		} else {
> > > >  			xe_res_next(&src_it, clear_L0);
> > > >  		}
> > > > --
> > > > 2.25.1
> > > > 
> > > 
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation
>