[PATCH] drm/xe: Use ttm_uncached for BO with NEEDS_UC flag

Tue Jun 18 18:54:58 UTC 2024

On Tue, Jun 18, 2024 at 08:29:24PM +0200, Thomas Hellström wrote:
> Hi, Matt
> 
> On Tue, 2024-06-18 at 09:43 -0700, Matt Roper wrote:
> > On Tue, Jun 18, 2024 at 02:38:01PM +0200, Thomas Hellström wrote:
> > > Hi,
> > > 
> > > On Mon, 2024-06-17 at 13:28 -0700, Matt Roper wrote:
> > > > On Wed, Jun 12, 2024 at 08:03:24PM +0200, Michal Wajdeczko wrote:
> > > > > Hi Thomas,
> > > > > 
> > > > > On 11.06.2024 14:47, Thomas Hellström wrote:
> > > > > > Hi, Michal,
> > > > > > 
> > > > > > On Thu, 2024-06-06 at 21:56 +0200, Michal Wajdeczko wrote:
> > > > > > > We should honor requested uncached mode also at the TTM
> > > > > > > layer.
> > > > > > > Otherwise, we risk losing updates to the memory based
> > > > > > > interrupts
> > > > > > > source or status vectors, as those require uncached memory.
> > > > > > > 
> > > > > > > Signed-off-by: Michal Wajdeczko
> > > > > > > <michal.wajdeczko at intel.com>
> > > > > > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_bo.c | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > index 2bae01ce4e5b..2573cc118f29 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > @@ -378,6 +378,9 @@ static struct ttm_tt
> > > > > > > *xe_ttm_tt_create(struct
> > > > > > > ttm_buffer_object *ttm_bo,
> > > > > > >  	    (xe->info.graphics_verx100 >= 1270 && bo-
> > > > > > > >flags &
> > > > > > > XE_BO_FLAG_PAGETABLE))
> > > > > > >  		caching = ttm_write_combined;
> > > > > > >  
> > > > > > > +	if (bo->flags & XE_BO_FLAG_NEEDS_UC)
> > > > > > > +		caching = ttm_uncached;
> > > > > > > +
> > > > > > >  	err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags,
> > > > > > > caching,
> > > > > > > extra_pages);
> > > > > > >  	if (err) {
> > > > > > >  		kfree(tt);
> > > > > > 
> > > > > > To me the preferred method is to teach bo->cpu_caching about
> > > > > > the
> > > > > > uncached mode and then include it in the switch statement
> > > > > > above.
> > > > > > 
> > > > > 
> > > > > but bo->cpu_caching is currently documented as:
> > > > > 
> > > > > /**
> > > > >  * @cpu_caching: CPU caching mode. Currently only used for
> > > > > userspace
> > > > >  * objects.
> > > > >  */
> > > > > 
> > > > > and value 0 is implicitly reserved as kind of default, so
> > > > > 'teaching'
> > > > > would likely mean either extending uapi with something like:
> > > > > 
> > > > >   #define DRM_XE_GEM_CPU_CACHING_WB                      1
> > > > >   #define DRM_XE_GEM_CPU_CACHING_WC                      2
> > > > > + #define DRM_XE_GEM_CPU_CACHING_UC                      3
> > > > > 
> > > > > which will introduce lot of undesired right now code changes,
> > > > > or we
> > > > > will
> > > > > introduce internal only flag:
> > > > > 
> > > > > + #define XE_CPU_CACHING_UC                      ((u16)~0)
> > > > > 
> > > > > but that doesn't look like a clean solution.
> > > > > 
> > > > > 
> > > > > OTOH, just above this new diff chunk, there is already a code
> > > > > that
> > > > > updates caching mode outside the "switch statement above":
> > > > > 
> > > > > 	if ((!bo->cpu_caching && bo->flags &
> > > > > XE_BO_FLAG_SCANOUT)
> > > > > > > 
> > > > > 	    (xe->info.graphics_verx100 >= 1270 &&
> > > > > 	     bo->flags & XE_BO_FLAG_PAGETABLE))
> > > > > 		caching = ttm_write_combined;
> > > > > 
> > > > > so maybe as a short term solution we can keep this patch as
> > > > > it's
> > > > > doing
> > > > > similar last resort stuff and return to 'preferred way' later:
> > > > > 
> > > > > 	if (!bo->cpu_caching && bo->flags &
> > > > > XE_BO_FLAG_NEEDS_UC)
> > > > > 		caching = ttm_uncached;
> > > > 
> > > > Yeah, cpu_caching is a "uapi only" thing at the moment (and even
> > > > then
> > > > is
> > > > only set in some situations).  Given the current design and
> > > > assumptions
> > > > of the code, maybe it would be more clear to add an assertion
> > > > like
> > > > this
> > > > to help document why this is special?
> > > > 
> > > >         if (bo->flags & XE_BO_FLAG_NEEDS_UC) {
> > > >                 /*
> > > >                  * Valid only for internally-created buffers
> > > > only,
> > > > for
> > > >                  * which cpu_caching is never initialized.
> > > >                  */
> > > >                 xe_assert(xe, bo->cpu_caching == 0);
> > > >                 caching = ttm_uncached;
> > > >         }
> > > > 
> > > > If we decide we want a more general redesign of cpu_caching
> > > > behavior,
> > > > that would probably be a separate change from the direct
> > > > functional
> > > > fix
> > > > here.
> > > 
> > > I do think the change should have actually been done before the
> > > scanout
> > > caching hack. We shouldn't be building special cases like this, but
> > > rather fix what's missing.
> > 
> > I think things happened the other way around.  The scanout caching
> > adjustment pre-dates the existence of bo->cpu_caching in the driver.
> > When bo->cpu_caching got added in commit 622f709ca629 ("drm/xe/uapi:
> > Add
> > support for CPU caching mode"), it intentionally left kernel objects
> > set
> > to 0 by design.
> 
> I don't really see a discussion around the kernel objects in the commit
> message, it mentions "currently" in the caching mode doc, but that
> sounds more like a documentation of current status than a guideline.
> Ofc, it might be in the review discussion but I haven't looked too
> closely TBH.
> 
> Hmm. This commit raises some questions. Do you remember the reason for
> leaving out the kernel bos? Was it because kernel bos relies on
> implicit caching mode selection whereas the user-space bo caching mode
> selection is now explicit? Otherwise it's pretty standard in the driver
> to map the DRM_XE user-space flags / enums to XE_ internal ones.

Unfortunately I don't remember the details here very well myself; I know
this patch went through a lot of revisions and morphed a fair bit during
the review cycles.  Adding a couple extra Cc's of people who were more
actively involved in the review and may have a clearer memory of how we
initially settled on the userspace-only design.

+cc Jose, Oak

Matt

> 
> (As a complete side note it looks like the system pages for VRAM-only
> user-bo eviction are now forced to be write-combined by the uAPI?)
> 
> > 
> > We can certainly change the design now if everyone agrees that it
> > makes
> > the code cleaner,
> 
> This is probably a bigger change than I originally though and requires
> some additional consideration of the above.
> 
> >  but I think that the general refactoring and
> > repurposing of bo->cpu_caching is orthogonal to the functional fix
> > that
> > Michal is providing here.
> 
> Generally I think that if something is missing to be able to cleanly
> add a fix, then we should definitely try our best to add that something
> _first_. In this case it turns out, though, that it requires some
> additional afterthought.
> 
> So for the patch
> 
> Acked-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> 
> 
> 
> > 
> > +Cc Pallavi, Matt Auld as the authors of the original design in case
> > they have any thoughts on extending the usage of bo->cpu_caching.
> > 
> > 
> > Matt
> > 
> > > 
> > > Can't we make bo->cpu_caching valid also for kernel bos with a new
> > > enum
> > > and do the translation in the ioctl?
> > > 
> > > /Thomas
> > > 
> > > 
> > > > 
> > > > 
> > > > Matt
> > > > 
> > > > > 
> > > > > Michal
> > > > 
> > > 
> > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation