[Intel-gfx] [PATCH v2 0/5] drm/i915: Allow user to set cache at BO creation

Fri Apr 28 19:51:10 UTC 2023

> On 27/04/2023 17:07, Yang, Fei wrote:
>>> On 26/04/2023 16:41, Yang, Fei wrote:
>>>>> On 26/04/2023 07:24, fei.yang at intel.com wrote:
>>>>>> From: Fei Yang <fei.yang at intel.com>
>>>>>>
>>>>>> The first three patches in this series are taken from
>>>>>> https://patchwork.freedesktop.org/series/116868/
>>>>>> These patches are included here because the last patch
>>>>>> has dependency on the pat_index refactor.
>>>>>>
>>>>>> This series is focusing on uAPI changes,
>>>>>> 1. end support for set caching ioctl [PATCH 4/5]
>>>>>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>>>>>
>>>>>> v2: drop one patch that was merged separately
>>>>>>      341ad0e8e254 drm/i915/mtl: Add PTE encode function
>>>>>
>>>>> Are the re-sends for stabilizing the series, or focusing on merge?
>>>>
>>>> v2 was sent just to drop one of patches that has already been merged.
>>>>
>>>>> If the latter then opens are:
>>>>>
>>>>> 1) Link to Mesa MR reviewed and ready to merge.
>>>>>
>>>>> 2) IGT reviewed.
>>>>>
>>>>> 3) I raised an open that get/set_caching should not "lie" but return an
>>>>> error if set pat extension has been used. I don't see a good reason not
>>>>> to do that.
>>>>
>>>> I don't think it's "lying" to the userspace. the comparison is only valid
>>>> for objects created by KMD because only such object uses the pat_index
>>>> from the cachelevel_to_pat table.
>>>
>>> Lets double check my understanding is correct. Userspace sequence of
>>> operations:
>>> 1)
>>> obj = gem_create()+set_pat(PAT_UC)
>>>
>>> 2a)
>>> get_caching(obj)
>>> What gets reported?
>>
>> I see your point here. nice catch.
>> That should be handled by,
>> if (obj->cachel_level == I915_CACHE_INVAL) /* indicated pat-index is set by userspace */
>>       return -EOPNOTSUPP;
>> Will update the patch.
>>
>>>
>>> 2b)
>>> set_caching(obj, I915_CACHE_LLC)
>>> What is the return code?
>>
>> This will either return -EOPNOTSUPP, or ignored because set_pat
>> extension was called.
>
> I see that you made it fail instead of fake success in the latest respin
> and I definitely agree with that.

Thanks for your picky eyes. :)

>>>
>>> If answer to 2a is I915_CACHING_CACHED and to 2b) success, then please
>>> state a good reason why both shouldn't return an error.
>>>
>>>>
>>>>> + Joonas on this one.
>>>>>
>>>>> 4) Refactoring as done is not very pretty and I proposed an idea for a
>>>>> nicer approach. Feasible or not, open for discussion.
>>>>
>>>> Still digesting your proposal. but not sure how would you define things
>>>> like PAT_UC as that is platform dependent, not a constant.
>>>
>>> "PAT_UC" in my outline was purely a driver specific value, similarly as
>>> I915_CACHE_... are.
>>
>> Okay. Then you were suggesting to add a translation table for
>> pat_index-to-(PAT-UC/WB/WT)?
>> It's going to be a N:1 mapping.
>
> PAT index to a value, probably a bitmask, built out of bits which define
> caching modes. Like "PAT_WB | PAT_1WAY_COHERENT", or just PAT_WB. Would
> that approach be enough to express everything?

I was thinking about dumping the PAT tables from Bspec into struct intel_device_info {}.
But that would be only useful to unify all those *_setup_private_ppat() functions in
intel_gtt.c. Kernel doesn't really care much about PAT index except making sure the bits
are programmed correctly in PTE.

>>
>>> With the whole point that driver can ask if
>>> something is uncached, WT or whatever. Using the platform specific
>>> mapping table which converts platform specific obj->pat_index to driver
>>> representation of caching modes (PAT_UC etc).
>>
>> Are you suggesting completely remove i915_cache_level and use
>> (PAT-UC/WB/WT) instead?
>
> Not completely but throughout the most internal code paths, which would
> all just work on PAT index. Basically object always has PAT index set,
> with a separate boolean saying whether it came from gem_create_ext or
> set_cache_level.

What's in my mind as an improvement is to completely remove i915_cache_level.
KMD is supposed to abstract the hardware, but in this case, since the desgin
is that UMDs understand PAT index (and MOCS as well), having an abstraction
in between would only introduce ambiguity and complexity.

Also kernel doesn't need to play with all available PAT indices, so it should
be okay to add i915->pat_{uc|wb|wt} and initialize them with proper indices
respectively. That takes care of the PAT checking as well wherever it's needed,
like "if (pat_index == i915->pat_uc)"

>> Let's say a KMD object wants to be set as WB, how you get the exact pat_index to use?
>> Note that there are multiple PAT indices having caching mod WB, but they are different,
>> e.g. do you want just WB or WB with 1-way-coherency or WB with 2-way
>> coherency?
>
> Just use the cache_level to pat_index mapping you added in the series?
>
>> Also, in case a checking of pat_index is needed, do we say WB with
>> 1-way-coherency is equal to WB or not?
>
> You mean the call sites where i915 is checking the object caching mode?
> We have two call sites which check for !I915_CACHE_NONE. Those would
> just check if PAT_UC is not set.
>
> Then the one in gpu_write_needs_clflush is checking for neither UC nor
> WT, which also directly translates.
>
> For the WB case there aren't any callers but if we just checked for
> "base" PAT_WB bit being set that would work.
>
> So in all cases helper which does "return bits_required | bits_set"
> seems would work fine.
>
>> BTW, isn't PAT-UC/WB/WT the same kind of abstraction as enum
>> i915_cache_level, I'm not
>> sure how that would simplify anything.
>
> As I wrote before, I *think* it provides a way of not needing to
> sprinkle around i915_gem_get_pat_index and a simpler "has_cache_level".
> Conceptually cache_level->pat_index is done only in gem_create_ext and
> set_cache_level. Lower level code does not have to consult cache_level
> at all. And existence of tables simplifies the pretty printing code to a
> platform agnostic loop.
>
> I *think* at least.. We can leave it all for later. My main concern was
> that UAPI needs to be clear and solid which it now seems to be.

I'm hesitant to do all that I said above in one shot. But I think completely
removing enum i915_cache_level is what to do next.

-Fei

> Regards,
>
> Tvrtko
>
>>
>>> Quite possible I missed some detail, but if I haven't then it could be
>>> a neat and lightweight solution.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20230428/d61ea54f/attachment-0001.htm>