[Intel-gfx] [PATCH v5 2/5] drm/i915: use pat_index instead of cache_level
Yang, Fei
fei.yang at intel.com
Sat May 6 08:03:00 UTC 2023
>>>>
>>>> static u64 mtl_pte_encode(dma_addr_t addr,
>>>> - enum i915_cache_level level,
>>>> + unsigned int pat_index,
>>>> u32 flags)
>>> Prototype and implementation changed here for mtl_pte_encode.
>>>
>>> And we have:
>>>
>>> if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>>> ppgtt->vm.pte_encode = mtl_pte_encode;
>>> else
>>> ppgtt->vm.pte_encode = gen8_pte_encode;
>>> So should be same prototype. And:
>>>
>>> u64 (*pte_encode)(dma_addr_t addr,
>>>- enum i915_cache_level level,
>>>+ unsigned int pat_index,
>>> u32 flags);
>>>/* Create a valid PTE */
>>> Patch relies on the compiler considering enum equal to unsigned int?
>>
>> yes, caller is passing in unsigned int and gets used as enum.
>>
>>> But the implementation of gen8_pte_encode and most ggtt
>>> counterparts is looking at the passed in pat index and thinks it is cache level.
>>>
>>> How is that supposed to work?! Or I am blind and am missing something?
>>
>> For legacy platforms translation through LEGACY_CACHELEVEL would not
>> change the value of cache_level. The cache_level and pat_index are
>> basically the same for these platforms.
>
> Oh that is nasty little trick! And I did not spot it being described
> anywhere in the commit message or code comments.
Will add.
> So you are saying for legacy cache_level equals pat_index for what
> caching behaviour is concerned. Ie:
>
> +#define LEGACY_CACHELEVEL \
> + .cachelevel_to_pat = { \
> + [I915_CACHE_NONE] = 0, \
> + [I915_CACHE_LLC] = 1, \
> + [I915_CACHE_L3_LLC] = 2, \
> + [I915_CACHE_WT] = 3, \
> + }
>
> And because:
>
> enum i915_cache_level {
> I915_CACHE_NONE = 0,
> I915_CACHE_LLC,
> I915_CACHE_L3_LLC,
> I915_CACHE_WT,
> };
>
> This indeed ends up a 1:1 reversible mapping.
>
> But it is hidden and fragile. What prevents someone from changing the
> enum i915_cache_level? There is no explicit linkage with hardware PAT
> values anywhere. Or at least I don't see it.
>
> I would say all pte_encode signatures have to be changed to pat index.
>
> Which means all pte encode implementations have to understand what pat indices mean.
>
> Which brings us back to that idea of a 2nd table, I paraphrase:
>
> .legacy_pat_to_cache = {
> [0] = I915_PAT_UC,
> [1] = I915_PAT_WB,
> [2] = I915_PAT_WB | I915_PAT_LLC /* not sure on this one */
> [3] = I915_PAT_WT,
> };
>
> Pat_encode implementations then instead:
>
> switch (level) {
> case I915_CACHE_NONE:
> pte |= PPAT_UNCACHED;
> ...
>
> Do:
>
> if (i915->pat_to_cache_flags[pat_index] & I915_PAT_UC)
> pte |= PPAT_UNCACHED;
> else if
> ...
>
>
> But it would require i915 to be passed in which is admittedly a
> noisy diff. Hm.. benefit of hardware agnostic enum i915_cache_level..
That's was one of the problem I ran into when creating this patch.
> Maybe convert pat_index to I915_PAT_.. flags in the callers? Like this:
>
> gen8_ppgtt_insert_pte(...)
> ...
> const u32 pat_flags = i915->pat_to_cache_flags[pat_index];
> const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_flags, flags);
>
> Etc. That would be smaller churn on the pte_encode signature.
>
> Maybe helper for i915->pat_to_cache_flags lookup so it can check the array bounds?
>
> If this all sounds too much to you maybe we can do it as a followup.
It's getting more complicated...
But I believe it should be doable to define a PAT table (same in Bspec) and drop
the cache_level. Together with the PPAT bit masks (such as MTL_PPAT_L4_3_UC) defined
in intel_gtt.h, the code can be simpler without translation.
> Or perhaps it is actually pointing towards that obj->pat_index is not the most
> elegant choice to be used as a single point of truth.. perhaps obj->cache_flags
> would be better. It would be set at same entry points and it would be hw agnostic
> so could end up more elegant in the driver.
>
> But then I think we need at minimum something like the below in this patch, somewhere:
>
> /*
> * On pre-Gen12 platforms enum i915_cache_level happens to align
> * with caching modes as specified in hardware PAT indices. Our
> * implementation relies on that due tricks played (explain the
> * tricks) in the pte_encode vfuncs.
> * Ensure this trick keeps working until the driver can be fully
> * refactored to support pat indices better.
> */
> BUILD_BUG_ON(I915_CACHE_NONE != 0);
> ... etc for all enums ...
Will add
> if (gen < 12) {
> GEM_WARN_ON(i915_gem_get_pat_index(i915, I915_CACHE_NONE) != 0);
> ... etc for all enums ...
This should not be necessary as long as the enum is verified.
> }
>
>> It is broken for gen12 here. I was asked to separate the
>> gen12_pte_encode change to another patch in the series, but that
>> breaks bisect. Should I squash 2/5 and 3/5?
>
> This patch breaks gen12? Yes that should definitely be avoided.
Will fix that by squashing 2/5 and 3/5 with explanation in commit message.
> Regards,
>
> Tvrtko
More information about the Intel-gfx
mailing list