[Intel-gfx] [PATCH v3] drm/i915: Refactor PAT/object cache handling
Yang, Fei
fei.yang at intel.com
Fri Jul 21 04:28:56 UTC 2023
>>> [snip]
>>>>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct
>>>>> drm_i915_gem_object *obj)
>>>>
>>>> The code change here looks accurate, but while we're here, I have a
>>>> side question about this function in general...it was originally
>>>> introduced in commit 48004881f693 ("drm/i915: Mark CPU cache as
>>>> dirty when used for
>>>> rendering") which states that GPU rendering ends up in the CPU cache
>>>> (and thus needs a clflush later to make sure it lands in memory).
>>>> That makes sense to me for LLC platforms, but is it really true for
>>>> non-LLC snooping platforms (like MTL) as the commit states?
>>>
>>> For non-LLC platforms objects can be set to 1-way coherent which
>>> means GPU rendering ending up in CPU cache as well, so for non-LLC
>>> platform the logic here should be checking 1-way coherent flag.
>>
>> That's the part that I'm questioning (and not just for MTL, but for
>> all of our other non-LLC platforms too). Just because there's
>> coherency doesn't mean that device writes landed in the CPU cache.
>> Coherency is also achieved if device writes invalidate the contents of the CPU cache.
>> I thought our non-LLC snooping platforms were coherent due to
>> write-invalidate rather than write-update, but I can't find it
>> specifically documented anywhere at the moment. If write-invalidate
>> was used, then there shouldn't be a need for a later clflush either.
>
> [Trying to consolidate by doing a combined reply to the discussion so far.]
>
> On the write-invalidate vs write-update I don't know. If you did not
> find it in bspec then I doubt I would. I can have a browse still.
Matt was correct. Quote Ron Silvas from SW ARCH, "MTL GPU doesn't write to
CPU cache, it simply snoop CPU cache on its way to RAM."
>>>> My understanding
>>>> was that snooping platforms just invalidated the CPU cache to
>>>> prevent future CPU reads from seeing stale data but didn't actually
>>>> stick any new data in there? Am I off track or is the original
>>>> logic of this function not quite right?
>>>>
>>>> Anyway, even if the logic of this function is wrong, it's a mistake
>>>> that would only hurt performance
>>>
>>> Yes, this logic will introduce performance impact because it's
>>> missing the checking for obj->pat_set_by_user. For objects with
>>> pat_set_by_user==true, even if the object is snooping or 1-way
>>> coherent, we don't want to enforce a clflush here since the
>>> coherency is supposed to be handled by user space.
>
> What should I add you think to fix it?
I think the simplest would be
if (obj->pat_set_by_user)
return false;
because even checking for incoherent WB is unnecessary, simply no
need for the KMD to initiate a flush if PAT is set by user.
> Add a check for non-coherent WB in gpu_write_needs_clflush as an additional condition for returning false?
>
> And then if Matt is correct write-invalidate is used also !HAS_LLC should just return false?
>
>>>> (flushing more often than we truly need to) rather than
>>>> functionality, so not something we really need to dig into right now
>>>> as part of this patch.
>>>>
>>>>> if (IS_DGFX(i915))
>>>>> return false;
>>>>>
>>>>> - /*
>>>>> - * For objects created by userspace through GEM_CREATE with pat_index
>>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() will
>>>>> - * always return true, because the coherency of such object is managed
>>>>> - * by userspace. Othereise the call here would fall back to checking
>>>>> - * whether the object is un-cached or write-through.
>>>>> - */
>>>>> - return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>>>> + return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&
>>>>> + i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;
>>>>> }
>>>
>>> [snip]
>>>>> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>>>> if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>>>> return false;
>>>>>
>>>>> - /*
>>>>> - * For objects created by userspace through GEM_CREATE with pat_index
>>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() always
>>>>> - * return true, otherwise the call would fall back to checking whether
>>>>> - * the object is un-cached.
>>>>> - */
>>>>> return (cache->has_llc ||
>>>>> obj->cache_dirty ||
>>>>> - !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>>>> + i915_gem_object_has_cache_mode(obj,
>>>>> + I915_CACHE_MODE_UC) != 1);
>>>>
>>>> Platforms with relocations and platforms with user-specified PAT
>>>> have no overlap, right? So a -1 return should be impossible here
>>>> and this is one case where we could just treat the return value as
>>>> a boolean, right?
>>>
>
> Hm no, or maybe. My thinking behind tri-state is to allow a safe option
> for "don't know". In case PAT index to cache mode table is not fully
> populated on some future platform.
That would be a problem in the cache mode table. At least max_pat_index
should have guaranteed the PAT index is sane.
-Fei
More information about the Intel-gfx
mailing list