[Intel-gfx] [RFC 4/8] drm/i915: Refactor PAT/object cache handling

Fri Jul 28 12:55:52 UTC 2023

On 28/07/2023 08:14, Yang, Fei wrote:
> [snip]
>> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>                return false;
>>
>>        /*
>> -      * For objects created by userspace through GEM_CREATE with pat_index
>> -      * set by set_pat extension, i915_gem_object_has_cache_level() will
>> -      * always return true, because the coherency of such object is managed
> 
> i915_gem_object_has_cache_level() always return true means this function
> always return false.
> 
>> -      * by userspace. Othereise the call here would fall back to checking
>> -      * whether the object is un-cached or write-through.
>> +      * Always flush cache for UMD objects with PAT index set.
> 
> (obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
> forcing clflush in KMD would be redundant.

For Meteorlake I made gpu_write_needs_clflush() always return false anyway.

Could you please submit a patch with kerneldoc for i915_drm.h explaining 
what the set domain ioctl is expected to do when set pat extension is 
used? With the focus on the use cases of how userspace is managing 
coherency using it, or it isn't, or what.

>>         */
>> -     return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> -              i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +     if (obj->pat_set_by_user)
>> +             return true;
> 
> return false;

Oops, thank you! I did warn in the cover letter I was getting confused 
by boolean logic conversions, cross-referencing three versions, and 
extracting the pat_set_by_user to call sites. :)

>> +
>> +     /*
>> +      * Fully coherent cached access may end up with data in the CPU cache
>> +      * which hasn't hit memory yet.
>> +      */
>> +     return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
>> +            i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);
> 
> Why checking COH2W here? The logic was, if UC or WT return false, otherwise
> return true. So, as long as cache_mode is WB, it's sufficient to say true
> here, right?

I was trying to penetrate the reason behind the check.

Original code was:

        return !(obj->cache_level == I915_CACHE_NONE ||
                 obj->cache_level == I915_CACHE_WT);

Which is equivalent to "is it WB", right? (Since it matches on both old 
LLC flavours.)

Which I thought, in the context of this function, is supposed to answer 
the question of "can there be data in the shared cache written by the 
GPU but not committed to RAM yet".

And then I thought that can only ever happen with 2-way coherency. 
Otherwise GPU writes never end up in the CPU cache.

Did I get that wrong? Maybe I have..

Regards,

Tvrtko