[PATCH v5 2/5] drm/i915: use pat_index instead of cache_level

Sat May 6 08:03:00 UTC 2023

>>>>
>>>>   static u64 mtl_pte_encode(dma_addr_t addr,
>>>> -                       enum i915_cache_level level,
>>>> +                       unsigned int pat_index,
>>>>                         u32 flags)
>>> Prototype and implementation changed here for mtl_pte_encode.
>>>
>>> And we have:
>>>
>>>        if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>>> ppgtt->vm.pte_encode = mtl_pte_encode;
>>>        else
>>> ppgtt->vm.pte_encode = gen8_pte_encode;
>>> So should be same prototype. And:
>>>
>>>         u64 (*pte_encode)(dma_addr_t addr,
>>>-                         enum i915_cache_level level,
>>>+                         unsigned int pat_index,
>>>                          u32 flags);
>>>/* Create a valid PTE */
>>> Patch relies on the compiler considering enum equal to unsigned int?
>>
>> yes, caller is passing in unsigned int and gets used as enum.
>>
>>> But the implementation of gen8_pte_encode and most ggtt
>>> counterparts is looking at the passed in pat index and thinks it is cache level.
>>>
>>> How is that supposed to work?! Or I am blind and am missing something?
>>
>> For legacy platforms translation through LEGACY_CACHELEVEL would not
>> change the value of cache_level. The cache_level and pat_index are
>> basically the same for these platforms.
>
> Oh that is nasty little trick! And I did not spot it being described
> anywhere in the commit message or code comments.

Will add.

> So you are saying for legacy cache_level equals pat_index for what
> caching behaviour is concerned. Ie:
>
> +#define LEGACY_CACHELEVEL \
> +     .cachelevel_to_pat = { \
> +             [I915_CACHE_NONE]   = 0, \
> +             [I915_CACHE_LLC]    = 1, \
> +             [I915_CACHE_L3_LLC] = 2, \
> +             [I915_CACHE_WT]     = 3, \
> +     }
>
> And because:
>
> enum i915_cache_level {
>       I915_CACHE_NONE = 0,
>       I915_CACHE_LLC,
>       I915_CACHE_L3_LLC,
>       I915_CACHE_WT,
> };
>
> This indeed ends up a 1:1 reversible mapping.
>
> But it is hidden and fragile. What prevents someone from changing the
> enum i915_cache_level? There is no explicit linkage with hardware PAT
> values anywhere. Or at least I don't see it.
>
> I would say all pte_encode signatures have to be changed to pat index.
>
> Which means all pte encode implementations have to understand what pat indices mean.
>
> Which brings us back to that idea of a 2nd table, I paraphrase:
>
> .legacy_pat_to_cache = {
>       [0] = I915_PAT_UC,
>       [1] = I915_PAT_WB,
>       [2] = I915_PAT_WB | I915_PAT_LLC /* not sure on this one */
>       [3] = I915_PAT_WT,
> };
>
> Pat_encode implementations then instead:
>
>       switch (level) {
>       case I915_CACHE_NONE:
>               pte |= PPAT_UNCACHED;
>       ...
>
> Do:
>
>       if (i915->pat_to_cache_flags[pat_index] & I915_PAT_UC)
>               pte |= PPAT_UNCACHED;
>       else if
>       ...
>
>
> But it would require i915 to be passed in which is admittedly a
> noisy diff. Hm.. benefit of hardware agnostic enum i915_cache_level..

That's was one of the problem I ran into when creating this patch.

> Maybe convert pat_index to I915_PAT_.. flags in the callers? Like this:
>
> gen8_ppgtt_insert_pte(...)
> ...
>       const u32 pat_flags = i915->pat_to_cache_flags[pat_index];
>       const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_flags, flags);
>
> Etc. That would be smaller churn on the pte_encode signature.
>
> Maybe helper for i915->pat_to_cache_flags lookup so it can check the array bounds?
>
> If this all sounds too much to you maybe we can do it as a followup.

It's getting more complicated...
But I believe it should be doable to define a PAT table (same in Bspec) and drop
the cache_level. Together with the PPAT bit masks (such as MTL_PPAT_L4_3_UC) defined
in intel_gtt.h, the code can be simpler without translation.

> Or perhaps it is actually pointing towards that obj->pat_index is not the most
> elegant choice to be used as a single point of truth.. perhaps obj->cache_flags
> would be better. It would be set at same entry points and it would be hw agnostic
> so could end up more elegant in the driver.
>
> But then I think we need at minimum something like the below in this patch, somewhere:
>
> /*
>   * On pre-Gen12 platforms enum i915_cache_level happens to align
>   * with caching modes as specified in hardware PAT indices. Our
>   * implementation relies on that due tricks played (explain the
>   * tricks) in the pte_encode vfuncs.
>   * Ensure this trick keeps working until the driver can be fully
>   * refactored to support pat indices better.
>   */
> BUILD_BUG_ON(I915_CACHE_NONE != 0);
> ... etc for all enums ...

Will add

> if (gen < 12) {
>       GEM_WARN_ON(i915_gem_get_pat_index(i915, I915_CACHE_NONE) != 0);
>       ... etc for all enums ...

This should not be necessary as long as the enum is verified.

> }
>
>> It is broken for gen12 here. I was asked to separate the
>> gen12_pte_encode change to another patch in the series, but that
>> breaks bisect. Should I squash 2/5 and 3/5?
>
> This patch breaks gen12? Yes that should definitely be avoided.

Will fix that by squashing 2/5 and 3/5 with explanation in commit message.

> Regards,
>
> Tvrtko