[Intel-gfx] [PATCH v5 2/5] drm/i915: use pat_index instead of cache_level
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri May 5 09:50:40 UTC 2023
On 04/05/2023 17:06, Yang, Fei wrote:
> > On 04/05/2023 00:02, fei.yang at intel.com wrote:
> >> From: Fei Yang <fei.yang at intel.com>
> >>
> >> Currently the KMD is using enum i915_cache_level to set caching
> policy for
> >> buffer objects. This is flaky because the PAT index which really
> controls
> >> the caching behavior in PTE has far more levels than what's defined
> in the
> >> enum. In addition, the PAT index is platform dependent, having to
> translate
> >> between i915_cache_level and PAT index is not reliable, and makes
> the code
> >> more complicated.
> >>
> >>>From UMD's perspective there is also a necessity to set caching
> policy for
> >> performance fine tuning. It's much easier for the UMD to directly
> use PAT
> >> index because the behavior of each PAT index is clearly defined in
> Bspec.
> >> Having the abstracted i915_cache_level sitting in between would only
> cause
> >> more ambiguity.
> >>
> >> For these reasons this patch replaces i915_cache_level with PAT
> index. Also
> >> note, the cache_level is not completely removed yet, because the KMD
> still
> >> has the need of creating buffer objects with simple cache settings
> such as
> >> cached, uncached, or writethrough. For such simple cases, using
> cache_level
> >> would help simplify the code.
> >>
> >> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> >> Cc: Matt Roper <matthew.d.roper at intel.com>
> >> Signed-off-by: Fei Yang <fei.yang at intel.com>
> >> Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
> >
> > [snip]
> >
> >> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >> index bb6998d67133..f2334a713c4e 100644
> >> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> >> }
> >
> >^^^
> >
> > How come there are no changes to gen8_pte_encode?
>
> For legacy platforms cache_level is equal to pat_index, so I was thinking
> more about reducing number of lines changed.
>
> >vvv
> >
> >>
> >> static u64 mtl_pte_encode(dma_addr_t addr,
> >> - enum i915_cache_level level,
> >> + unsigned int pat_index,
> >> u32 flags)
> >
> > Prototype and implementation changed here for mtl_pte_encode.
> >
> > And we have:
> >
> > if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> > ppgtt->vm.pte_encode = mtl_pte_encode;
> > else
> > ppgtt->vm.pte_encode = gen8_pte_encode;
> >
> > So should be same prototype. And:
> >
> > u64 (*pte_encode)(dma_addr_t addr,
> >- enum i915_cache_level level,
> >+ unsigned int pat_index,
> > u32 flags); /* Create a valid PTE */
> >
> > Patch relies on the compiler considering enum equal to unsigned int?
>
> yes, caller is passing in unsigned int and gets used as enum.
>
> > But the implementation of gen8_pte_encode and most ggtt counterparts is
> > looking at the passed in pat index and thinks it is cache level.
> >
> > How is that supposed to work?! Or I am blind and am missing something?
>
> For legacy platforms translation through LEGACY_CACHELEVEL would not
> change the value of cache_level. The cache_level and pat_index are basically
> the same for these platforms.
Oh that is nasty little trick! And I did not spot it being described anywhere in the commit message or code comments.
So you are saying for legacy cache_level equals pat_index for what caching behaviour is concerned. Ie:
+#define LEGACY_CACHELEVEL \
+ .cachelevel_to_pat = { \
+ [I915_CACHE_NONE] = 0, \
+ [I915_CACHE_LLC] = 1, \
+ [I915_CACHE_L3_LLC] = 2, \
+ [I915_CACHE_WT] = 3, \
+ }
And because:
enum i915_cache_level {
I915_CACHE_NONE = 0,
I915_CACHE_LLC,
I915_CACHE_L3_LLC,
I915_CACHE_WT,
};
This indeed ends up a 1:1 reversible mapping.
But it is hidden and fragile. What prevents someone from changing the enum i915_cache_level? There is no explicit linkage with hardware PAT values anywhere. Or at least I don't see it.
I would say all pte_encode signatures have to be changed to pat index.
Which means all pte encode implementations have to understand what pat indices mean.
Which brings us back to that idea of a 2nd table, I paraphrase:
.legacy_pat_to_cache = {
[0] = I915_PAT_UC,
[1] = I915_PAT_WB,
[2] = I915_PAT_WB | I915_PAT_LLC /* not sure on this one */
[3] = I915_PAT_WT,
};
Pat_encode implementations then instead:
switch (level) {
case I915_CACHE_NONE:
pte |= PPAT_UNCACHED;
...
Do:
if (i915->pat_to_cache_flags[pat_index] & I915_PAT_UC)
pte |= PPAT_UNCACHED;
else if
...
But it would require i915 to be passed in which is admittedly a noisy diff. Hm.. benefit of hardware agnostic enum i915_cache_level.. Maybe convert pat_index to I915_PAT_.. flags in the callers? Like this:
gen8_ppgtt_insert_pte(...)
...
const u32 pat_flags = i915->pat_to_cache_flags[pat_index];
const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_flags, flags);
Etc. That would be smaller churn on the pte_encode signature.
Maybe helper for i915->pat_to_cache_flags lookup so it can check the array bounds?
If this all sounds too much to you maybe we can do it as a followup.
Or perhaps it is actually pointing towards that obj->pat_index is not the most elegant choice to be used as a single point of truth.. perhaps obj->cache_flags would be better. It would be set at same entry points and it would be hw agnostic so could end up more elegant in the driver.
But then I think we need at minimum something like the below in this patch, somewhere:
/*
* On pre-Gen12 platforms enum i915_cache_level happens to align
* with caching modes as specified in hardware PAT indices. Our
* implementation relies on that due tricks played (explain the
* tricks) in the pte_encode vfuncs.
* Ensure this trick keeps working until the driver can be fully
* refactored to support pat indices better.
*/
BUILD_BUG_ON(I915_CACHE_NONE != 0);
... etc for all enums ...
if (gen < 12) {
GEM_WARN_ON(i915_gem_get_pat_index(i915, I915_CACHE_NONE) != 0);
... etc for all enums ...
}
> It is broken for gen12 here. I was asked to separate the gen12_pte_encode
> change to another patch in the series, but that breaks bisect. Should I
> squash 2/5 and 3/5?
This patch breaks gen12? Yes that should definitely be avoided.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list