[PATCH v3] drm/i915: Refactor PAT/object cache handling

Thu Jul 20 00:22:08 UTC 2023

On Wed, Jul 19, 2023 at 05:07:15PM -0700, Yang, Fei wrote:
> [snip]
> >> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> >
> > The code change here looks accurate, but while we're here, I have a side
> > question about this function in general...it was originally introduced
> > in commit 48004881f693 ("drm/i915: Mark CPU cache as dirty when used for
> > rendering") which states that GPU rendering ends up in the CPU cache
> > (and thus needs a clflush later to make sure it lands in memory).  That
> > makes sense to me for LLC platforms, but is it really true for non-LLC
> > snooping platforms (like MTL) as the commit states?
> 
> For non-LLC platforms objects can be set to 1-way coherent which means
> GPU rendering ending up in CPU cache as well, so for non-LLC platform
> the logic here should be checking 1-way coherent flag.

That's the part that I'm questioning (and not just for MTL, but for all
of our other non-LLC platforms too).  Just because there's coherency
doesn't mean that device writes landed in the CPU cache.  Coherency is
also achieved if device writes invalidate the contents of the CPU cache.
I thought our non-LLC snooping platforms were coherent due to
write-invalidate rather than write-update, but I can't find it
specifically documented anywhere at the moment.  If write-invalidate was
used, then there shouldn't be a need for a later clflush either.

> 
> > My understanding
> > was that snooping platforms just invalidated the CPU cache to prevent
> > future CPU reads from seeing stale data but didn't actually stick any
> > new data in there?  Am I off track or is the original logic of this
> > function not quite right?
> >
> > Anyway, even if the logic of this function is wrong, it's a mistake that
> > would only hurt performance
> 
> Yes, this logic will introduce performance impact because it's missing the
> checking for obj->pat_set_by_user. For objects with pat_set_by_user==true,
> even if the object is snooping or 1-way coherent, we don't want to enforce
> a clflush here since the coherency is supposed to be handled by user space.
> 
> > (flushing more often than we truly need to)
> > rather than functionality, so not something we really need to dig into
> > right now as part of this patch.
> >
> >>      if (IS_DGFX(i915))
> >>              return false;
> >>
> >> -    /*
> >> -     * For objects created by userspace through GEM_CREATE with pat_index
> >> -     * set by set_pat extension, i915_gem_object_has_cache_level() will
> >> -     * always return true, because the coherency of such object is managed
> >> -     * by userspace. Othereise the call here would fall back to checking
> >> -     * whether the object is un-cached or write-through.
> >> -     */
> >> -    return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> >> -             i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> >> +    return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&
> >> +           i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;
> >>  }
> 
> [snip]
> >> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
> >>      if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
> >>              return false;
> >>
> >> -    /*
> >> -     * For objects created by userspace through GEM_CREATE with pat_index
> >> -     * set by set_pat extension, i915_gem_object_has_cache_level() always
> >> -     * return true, otherwise the call would fall back to checking whether
> >> -     * the object is un-cached.
> >> -     */
> >>      return (cache->has_llc ||
> >>              obj->cache_dirty ||
> >> -            !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> >> +            i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1);
> >
> > Platforms with relocations and platforms with user-specified PAT have no
> > overlap, right?  So a -1 return should be impossible here and this is
> > one case where we could just treat the return value as a boolean, right?
> 
> My understanding is that the condition here means to say that, if GPU
> access is uncached, don't use CPU reloc because the CPU cache might
> contain stale data. This condition is sufficient for snooping platforms.
> But from MTL onward, the condition show be whether the GPU access is
> coherent with CPU. So, we should be checking 1-way coherent flag instead
> of UC mode, because even if the GPU access is WB, it's still non-coherent,
> thus CPU cache could be out-dated.

My point is that this is relocation code --- it should be impossible to
get here on MTL and beyond, right?  So user-provided PAT isn't a
consideration.

> 
> [snip]
> >> @@ -208,12 +230,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> >>      if (!(obj->flags & I915_BO_ALLOC_USER))
> >>              return false;
> >>
> >> -    /*
> >> -     * Always flush cache for UMD objects at creation time.
> >> -     */
> >> -    if (obj->pat_set_by_user)
> >> -            return true;
> >> -
> 
> I'm still worried that the removal of these lines would cause the
> MESA failure seen before. I know you are checking pat index below, but
> that is only about GPU access. It doesn't tell you how CPU is going to
> access the memory. If user space is setting an uncached PAT, then use
> copy engine to zero out the memory, but on the CPU side the mapping is
> cacheable, you could still seeing garbage data.
> 
> I agree the lines don't belong here because it doesn't have anything
> to do with LLC, but they need to be moved to the right location instead
> of being removed.

These lines got replaced with a check for the specific PAT indices that
are problematic rather than just assuming any user-provided PAT might
cause problems.  But I had some concerns about the specific logic there
in my review as well.

Matt

> 
> >>      /*
> >>       * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
> >>       * possible for userspace to bypass the GTT caching bits set by the
> >> @@ -226,7 +242,21 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> >>       * it, but since i915 takes the stance of always zeroing memory before
> >>       * handing it to userspace, we need to prevent this.
> >>       */
> >> -    return IS_JSL_EHL(i915);
> >> +    if (IS_JSL_EHL(i915))
> >> +            return true;
> >> +

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation