[Intel-gfx] [PATCH v3] drm/i915: Refactor PAT/object cache handling

Thu Jul 20 00:07:15 UTC 2023

[snip]
>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>
> The code change here looks accurate, but while we're here, I have a side
> question about this function in general...it was originally introduced
> in commit 48004881f693 ("drm/i915: Mark CPU cache as dirty when used for
> rendering") which states that GPU rendering ends up in the CPU cache
> (and thus needs a clflush later to make sure it lands in memory).  That
> makes sense to me for LLC platforms, but is it really true for non-LLC
> snooping platforms (like MTL) as the commit states?

For non-LLC platforms objects can be set to 1-way coherent which means
GPU rendering ending up in CPU cache as well, so for non-LLC platform
the logic here should be checking 1-way coherent flag.

> My understanding
> was that snooping platforms just invalidated the CPU cache to prevent
> future CPU reads from seeing stale data but didn't actually stick any
> new data in there?  Am I off track or is the original logic of this
> function not quite right?
>
> Anyway, even if the logic of this function is wrong, it's a mistake that
> would only hurt performance

Yes, this logic will introduce performance impact because it's missing the
checking for obj->pat_set_by_user. For objects with pat_set_by_user==true,
even if the object is snooping or 1-way coherent, we don't want to enforce
a clflush here since the coherency is supposed to be handled by user space.

> (flushing more often than we truly need to)
> rather than functionality, so not something we really need to dig into
> right now as part of this patch.
>
>>      if (IS_DGFX(i915))
>>              return false;
>>
>> -    /*
>> -     * For objects created by userspace through GEM_CREATE with pat_index
>> -     * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -     * always return true, because the coherency of such object is managed
>> -     * by userspace. Othereise the call here would fall back to checking
>> -     * whether the object is un-cached or write-through.
>> -     */
>> -    return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -             i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +    return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&
>> +           i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;
>>  }

[snip]
>> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>      if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>              return false;
>>
>> -    /*
>> -     * For objects created by userspace through GEM_CREATE with pat_index
>> -     * set by set_pat extension, i915_gem_object_has_cache_level() always
>> -     * return true, otherwise the call would fall back to checking whether
>> -     * the object is un-cached.
>> -     */
>>      return (cache->has_llc ||
>>              obj->cache_dirty ||
>> -            !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +            i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1);
>
> Platforms with relocations and platforms with user-specified PAT have no
> overlap, right?  So a -1 return should be impossible here and this is
> one case where we could just treat the return value as a boolean, right?

My understanding is that the condition here means to say that, if GPU
access is uncached, don't use CPU reloc because the CPU cache might
contain stale data. This condition is sufficient for snooping platforms.
But from MTL onward, the condition show be whether the GPU access is
coherent with CPU. So, we should be checking 1-way coherent flag instead
of UC mode, because even if the GPU access is WB, it's still non-coherent,
thus CPU cache could be out-dated.

[snip]
>> @@ -208,12 +230,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>      if (!(obj->flags & I915_BO_ALLOC_USER))
>>              return false;
>>
>> -    /*
>> -     * Always flush cache for UMD objects at creation time.
>> -     */
>> -    if (obj->pat_set_by_user)
>> -            return true;
>> -

I'm still worried that the removal of these lines would cause the
MESA failure seen before. I know you are checking pat index below, but
that is only about GPU access. It doesn't tell you how CPU is going to
access the memory. If user space is setting an uncached PAT, then use
copy engine to zero out the memory, but on the CPU side the mapping is
cacheable, you could still seeing garbage data.

I agree the lines don't belong here because it doesn't have anything
to do with LLC, but they need to be moved to the right location instead
of being removed.

>>      /*
>>       * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>>       * possible for userspace to bypass the GTT caching bits set by the
>> @@ -226,7 +242,21 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>       * it, but since i915 takes the stance of always zeroing memory before
>>       * handing it to userspace, we need to prevent this.
>>       */
>> -    return IS_JSL_EHL(i915);
>> +    if (IS_JSL_EHL(i915))
>> +            return true;
>> +