[PATCH v2] drm/xe/xe2: Use XE_CACHE_WB pat index

Ghimiray, Himal Prasad himal.prasad.ghimiray at intel.com
Thu Jan 4 06:16:25 UTC 2024



> -----Original Message-----
> From: Roper, Matthew D <matthew.d.roper at intel.com>
> Sent: 04 January 2024 00:16
> To: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>
> Cc: intel-xe at lists.freedesktop.org
> Subject: Re: [PATCH v2] drm/xe/xe2: Use XE_CACHE_WB pat index
> 
> On Tue, Jan 02, 2024 at 07:17:22PM -0800, Ghimiray, Himal Prasad wrote:
> >
> >
> > > -----Original Message-----
> > > From: Roper, Matthew D <matthew.d.roper at intel.com>
> > > Sent: 03 January 2024 06:46
> > > To: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org
> > > Subject: Re: [PATCH v2] drm/xe/xe2: Use XE_CACHE_WB pat index
> > >
> > > On Tue, Jan 02, 2024 at 10:21:42PM +0530, Himal Prasad Ghimiray wrote:
> > > > Use XE_CACHE_WB pat index by default on xe2 platforms and use
> > > > XE_CACHE_NONE_COMPRESSION pat index only if buffer supports
> > > > compression and indirect access of flat ccs.
> > > >
> > > > Fixes the broken "xe_migrate_test kunit test" on LNL.
> > >
> > > How/why does it fix it?  Explaining why this is the right thing to
> > > do is more important than just stating that a test changes from red to
> green.
> > > Does the WB vs UC even matter here or is it actually the coherency
> > > with the CPU cache that matters (meaning UC+1way would also work,
> > > but
> > > WB+NonCoh would fail)?
> >
> > AFAIU the coherency with CPU cache is main thing that matters here. It
> > shouldn't be really related to WB vs UC.  The pat table entry
> > associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is
> NonCoherent.  UC+1way would also work, but WB+NonCoh would fail , I had
> tested using pat index as 0 and 5 respectively.
> 
> If non-coherent is the problem, then how does the compressed case work,
> where we always get a non-coherent PAT?  Is the actual problem just that
> we're hardcoding a setting in general?  Don't we already have a PAT index
> (selected by userspace at bind time) that we should be using?  Or am I
> misunderstanding what the PTE's we're emitting here are intended for?

For read/write to flat ccs region the issue is not related to coherency with cpu. 
The hardware expects the pat index associated with GPUVA for indirect access to be 
compression enabled.  For igpu in case of bo create and migration and c we need to ensure ccs clear and ccs copy only
no main memory clear/copy( as required in gdpu) is required. We are yet to see how the behavior will be in case of xe2 dgpu.

As per SAS the display buffers will not use PAT indexes for compression hence can't be guaranteed that compression enabled bo
will always be bound with compression supported PAT indexes. Therefore we are handling the PAT assignment in migrate_clear and
migrate_copy.

BR
Himal
> 
> 
> Matt
> 
> >
> > BR
> > Himal
> >
> > >
> > >
> > > Matt
> > >
> > > >
> > > > v2
> > > > - Fix the argument to emit_pte, pass the bool directly. (Thomas)
> > > >
> > > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > Signed-off-by: Himal Prasad Ghimiray
> > > > <himal.prasad.ghimiray at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_migrate.c | 14 +++++++-------
> > > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > b/drivers/gpu/drm/xe/xe_migrate.c index
> adf1dab5eba2..e1a849d74826
> > > > 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > @@ -439,7 +439,7 @@ static void emit_pte(struct xe_migrate *m,
> > > >  	/* Indirect access needs compression enabled uncached PAT index */
> > > >  	if (GRAPHICS_VERx100(xe) >= 2000)
> > > >  		pat_index = is_comp_pte ? xe-
> > > >pat.idx[XE_CACHE_NONE_COMPRESSION] :
> > > > -					  xe->pat.idx[XE_CACHE_NONE];
> > > > +					  xe->pat.idx[XE_CACHE_WB];
> > > >  	else
> > > >  		pat_index = xe->pat.idx[XE_CACHE_WB];
> > > >
> > > > @@ -729,14 +729,14 @@ struct dma_fence *xe_migrate_copy(struct
> > > xe_migrate *m,
> > > >  		}
> > > >
> > > >  		if (!src_is_vram)
> > > > -			emit_pte(m, bb, src_L0_pt, src_is_vram, true,
> > > &src_it, src_L0,
> > > > -				 src_bo);
> > > > +			emit_pte(m, bb, src_L0_pt, src_is_vram,
> > > copy_system_ccs,
> > > > +				 &src_it, src_L0, src_bo);
> > > >  		else
> > > >  			xe_res_next(&src_it, src_L0);
> > > >
> > > >  		if (!dst_is_vram)
> > > > -			emit_pte(m, bb, dst_L0_pt, dst_is_vram, true,
> > > &dst_it, src_L0,
> > > > -				 dst_bo);
> > > > +			emit_pte(m, bb, dst_L0_pt, dst_is_vram,
> > > copy_system_ccs,
> > > > +				 &dst_it, src_L0, dst_bo);
> > > >  		else
> > > >  			xe_res_next(&dst_it, src_L0);
> > > >
> > > > @@ -978,8 +978,8 @@ struct dma_fence *xe_migrate_clear(struct
> > > xe_migrate *m,
> > > >  		size -= clear_L0;
> > > >  		/* Preemption is enabled again by the ring ops. */
> > > >  		if (!clear_vram) {
> > > > -			emit_pte(m, bb, clear_L0_pt, clear_vram, true,
> > > &src_it, clear_L0,
> > > > -				 bo);
> > > > +			emit_pte(m, bb, clear_L0_pt, clear_vram,
> > > clear_system_ccs,
> > > > +				 &src_it, clear_L0, bo);
> > > >  		} else {
> > > >  			xe_res_next(&src_it, clear_L0);
> > > >  		}
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation
> 
> --
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation


More information about the Intel-xe mailing list