[PATCH] drm/xe: Use ttm_uncached for BO with NEEDS_UC flag

Tue Jun 18 16:43:23 UTC 2024

On Tue, Jun 18, 2024 at 02:38:01PM +0200, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2024-06-17 at 13:28 -0700, Matt Roper wrote:
> > On Wed, Jun 12, 2024 at 08:03:24PM +0200, Michal Wajdeczko wrote:
> > > Hi Thomas,
> > > 
> > > On 11.06.2024 14:47, Thomas Hellström wrote:
> > > > Hi, Michal,
> > > > 
> > > > On Thu, 2024-06-06 at 21:56 +0200, Michal Wajdeczko wrote:
> > > > > We should honor requested uncached mode also at the TTM layer.
> > > > > Otherwise, we risk losing updates to the memory based
> > > > > interrupts
> > > > > source or status vectors, as those require uncached memory.
> > > > > 
> > > > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > > > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_bo.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > > index 2bae01ce4e5b..2573cc118f29 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > @@ -378,6 +378,9 @@ static struct ttm_tt
> > > > > *xe_ttm_tt_create(struct
> > > > > ttm_buffer_object *ttm_bo,
> > > > >  	    (xe->info.graphics_verx100 >= 1270 && bo->flags &
> > > > > XE_BO_FLAG_PAGETABLE))
> > > > >  		caching = ttm_write_combined;
> > > > >  
> > > > > +	if (bo->flags & XE_BO_FLAG_NEEDS_UC)
> > > > > +		caching = ttm_uncached;
> > > > > +
> > > > >  	err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags,
> > > > > caching,
> > > > > extra_pages);
> > > > >  	if (err) {
> > > > >  		kfree(tt);
> > > > 
> > > > To me the preferred method is to teach bo->cpu_caching about the
> > > > uncached mode and then include it in the switch statement above.
> > > > 
> > > 
> > > but bo->cpu_caching is currently documented as:
> > > 
> > > /**
> > >  * @cpu_caching: CPU caching mode. Currently only used for
> > > userspace
> > >  * objects.
> > >  */
> > > 
> > > and value 0 is implicitly reserved as kind of default, so
> > > 'teaching'
> > > would likely mean either extending uapi with something like:
> > > 
> > >   #define DRM_XE_GEM_CPU_CACHING_WB                      1
> > >   #define DRM_XE_GEM_CPU_CACHING_WC                      2
> > > + #define DRM_XE_GEM_CPU_CACHING_UC                      3
> > > 
> > > which will introduce lot of undesired right now code changes, or we
> > > will
> > > introduce internal only flag:
> > > 
> > > + #define XE_CPU_CACHING_UC                      ((u16)~0)
> > > 
> > > but that doesn't look like a clean solution.
> > > 
> > > 
> > > OTOH, just above this new diff chunk, there is already a code that
> > > updates caching mode outside the "switch statement above":
> > > 
> > > 	if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT)
> > > ||
> > > 	    (xe->info.graphics_verx100 >= 1270 &&
> > > 	     bo->flags & XE_BO_FLAG_PAGETABLE))
> > > 		caching = ttm_write_combined;
> > > 
> > > so maybe as a short term solution we can keep this patch as it's
> > > doing
> > > similar last resort stuff and return to 'preferred way' later:
> > > 
> > > 	if (!bo->cpu_caching && bo->flags & XE_BO_FLAG_NEEDS_UC)
> > > 		caching = ttm_uncached;
> > 
> > Yeah, cpu_caching is a "uapi only" thing at the moment (and even then
> > is
> > only set in some situations).  Given the current design and
> > assumptions
> > of the code, maybe it would be more clear to add an assertion like
> > this
> > to help document why this is special?
> > 
> >         if (bo->flags & XE_BO_FLAG_NEEDS_UC) {
> >                 /*
> >                  * Valid only for internally-created buffers only,
> > for
> >                  * which cpu_caching is never initialized.
> >                  */
> >                 xe_assert(xe, bo->cpu_caching == 0);
> >                 caching = ttm_uncached;
> >         }
> > 
> > If we decide we want a more general redesign of cpu_caching behavior,
> > that would probably be a separate change from the direct functional
> > fix
> > here.
> 
> I do think the change should have actually been done before the scanout
> caching hack. We shouldn't be building special cases like this, but
> rather fix what's missing.

I think things happened the other way around.  The scanout caching
adjustment pre-dates the existence of bo->cpu_caching in the driver.
When bo->cpu_caching got added in commit 622f709ca629 ("drm/xe/uapi: Add
support for CPU caching mode"), it intentionally left kernel objects set
to 0 by design.

We can certainly change the design now if everyone agrees that it makes
the code cleaner, but I think that the general refactoring and
repurposing of bo->cpu_caching is orthogonal to the functional fix that
Michal is providing here.

+Cc Pallavi, Matt Auld as the authors of the original design in case
they have any thoughts on extending the usage of bo->cpu_caching.

Matt

> 
> Can't we make bo->cpu_caching valid also for kernel bos with a new enum
> and do the translation in the ioctl?
> 
> /Thomas
> 
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > Michal
> > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation