[Intel-gfx] [RFCv5 2/2] drm/i915: Introduce private PAT management
Wang, Zhi A
zhi.a.wang at intel.com
Tue Aug 29 11:18:47 UTC 2017
Thanks Joonas! :)
-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen at linux.intel.com]
Sent: Tuesday, August 29, 2017 12:37 PM
To: Wang, Zhi A <zhi.a.wang at intel.com>; intel-gfx at lists.freedesktop.org; intel-gvt-dev at lists.freedesktop.org
Cc: chris at chris-wilson.co.uk; zhenyuw at linux.intel.com; Widawsky, Benjamin <benjamin.widawsky at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>
Subject: Re: [RFCv5 2/2] drm/i915: Introduce private PAT management
On Tue, 2017-08-29 at 16:00 +0800, Zhi Wang wrote:
> The private PAT management is to support PPAT entry manipulation. Two
> APIs are introduced for dynamically managing PPAT entries:
> intel_ppat_get and intel_ppat_put.
>
> intel_ppat_get will search for an existing PPAT entry which perfectly
> matches the required PPAT value. If not, it will try to allocate or
> return a partially matched PPAT entry if there is any available PPAT
> indexes or not.
>
> intel_ppat_put will put back the PPAT entry which comes from
> intel_ppat_get. If it's dynamically allocated, the reference count
> will be decreased. If the reference count turns into zero, the PPAT
> index is freed again.
>
> Besides, another two callbacks are introduced to support the private
> PAT management framework. One is ppat->update(), which writes the PPAT
> configurations in ppat->entries into HW. Another one is ppat->match,
> which will return a score to show how two PPAT values match with each other.
>
> v5:
>
> - Add check and warnnings for those platforms which don't have PPAT.
>
> v3:
>
> - Introduce dirty bitmap for PPAT registers. (Chris)
> - Change the name of the pointer "dev_priv" to "i915". (Chris)
> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *.
> (Chris)
>
> v2:
>
> - API re-design. (Chris)
>
> Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Signed-off-by: Zhi Wang <zhi.a.wang at intel.com>
<SNIP>
> -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static struct intel_ppat_entry *alloc_ppat_entry(struct intel_ppat *ppat,
> + unsigned int index,
> + u8 value)
> {
> + struct intel_ppat_entry *entry = &ppat->entries[index];
> +
> + entry->ppat = ppat;
> + entry->value = value;
> + kref_init(&entry->ref_count);
> + set_bit(index, ppat->used);
> + set_bit(index, ppat->dirty);
> +
> + return entry;
> +}
> +
> +static void free_ppat_entry(struct intel_ppat_entry *entry) {
> + struct intel_ppat *ppat = entry->ppat;
> + int index = entry - ppat->entries;
> +
> + entry->value = ppat->dummy_value;
> + clear_bit(index, ppat->used);
> + set_bit(index, ppat->dirty);
> +}
Above functions should be __ prefixed as they do no checking if they override existing data. The suitable names might be __ppat_get and __ppat_put.
> +
> +/**
> + * intel_ppat_get - get a usable PPAT entry
> + * @i915: i915 device instance
> + * @value: the PPAT value required by the caller
> + *
> + * The function tries to search if there is an existing PPAT entry
> +which
> + * matches with the required value. If perfectly matched, the
> +existing PPAT
> + * entry will be used. If only partially matched, it will try to
> +check if
> + * there is any available PPAT index. If yes, it will allocate a new
> +PPAT
> + * index for the required entry and update the HW. If not, the
> +partially
> + * matched entry will be used.
> + */
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private
> +*i915,
Maybe split the function type and name to avoid exceeding 80 chars on next line.
> + u8 value)
> +{
> + struct intel_ppat *ppat = &i915->ppat;
> + struct intel_ppat_entry *entry;
> + int i, used;
> + unsigned int score, best_score;
> +
> + if (WARN_ON(!ppat->max_entries))
> + return ERR_PTR(-ENODEV);
No need for extra check like this, this will just lead to ENOSPC.
> +
> + score = best_score = 0;
> + used = 0;
This variable behaves more like scanned.
> +
> + /* First, find a suitable value from available entries */
The next two lines repeat this information, no need to document "what".
> + for_each_set_bit(i, ppat->used, ppat->max_entries) {
score could be declared in this scope.
> + score = ppat->match(ppat->entries[i].value, value);
> + /* Perfect match */
This comment is literally repeating what code says.
> + if (score == INTEL_PPAT_PERFECT_MATCH) {
> + entry = &ppat->entries[i];
> + kref_get(&entry->ref_count);
> + return entry;
> + }
> +
> + if (score > best_score) {
> + entry = &ppat->entries[i];
Above could be simplified:
if (score == INTEL_PPAT_PERFECT_MATCH) {
kref_get(&entry->ref);
return entry;
}
> + best_score = score;
> + }
> + used++;
> + }
> +
> + /* No matched entry and we can't allocate a new entry. */
DRM_ERROR replicates this comment's information.
> + if (!best_score && used == ppat->max_entries) {
> + DRM_ERROR("Fail to get PPAT entry\n");
DRM_DEBUG_DRIVER at most.
> + return ERR_PTR(-ENOSPC);
> + }
> +
> + /*
> + * Found a matched entry which is not perfect,
> + * and we can't allocate a new entry.
> + */
> + if (best_score && used == ppat->max_entries) {
> + kref_get(&entry->ref_count);
> + return entry;
> + }
Above code could be combined:
if (scanned == ppat->max_entries) {
if(!best_score)
return ERR_PTR(-ENOSPC);
kref_get(&entry->ref);
return entry;
}
> +
> + /* Allocate a new entry */
This comment is also just telling "what", which we can see from code.
> + i = find_first_zero_bit(ppat->used, ppat->max_entries);
> + entry = alloc_ppat_entry(ppat, i, value);
> + ppat->update(i915);
> + return entry;
> +}
> +
> +static void put_ppat(struct kref *kref)
ppat_release might cause less confusion, otherwise there will be 3 put functions.
> +{
> + struct intel_ppat_entry *entry =
> + container_of(kref, struct intel_ppat_entry, ref_count);
> + struct drm_i915_private *i915 = entry->ppat->i915;
> +
> + free_ppat_entry(entry);
> + entry->ppat->update(i915);
> +}
> +
> +/**
> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
> + * @entry: an intel PPAT entry
> + *
> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT
> +index of the
> + * entry is dynamically allocated, its reference count will be
> +decreased. Once
> + * the reference count becomes into zero, the PPAT index becomes free again.
> + */
> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
> + struct intel_ppat *ppat = entry->ppat;
> + int index = entry - ppat->entries;
> +
> + if (WARN_ON(!ppat->max_entries))
> + return;
This is clearly a kernel programmer error (and a serious one, so could be GEM_BUG_ON).
> +
> + kref_put(&ppat->entries[index].ref_count, put_ppat); }
> +
> +static void cnl_private_pat_update(struct drm_i915_private *dev_priv)
> +{
> + struct intel_ppat *ppat = &dev_priv->ppat;
> + int i;
> +
> + for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> + clear_bit(i, ppat->dirty);
> + I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
Clearing the bit after write is the logical thing to do.
> + }
> +}
> +
> +static void bdw_private_pat_update(struct drm_i915_private *dev_priv)
> +{
> + struct intel_ppat *ppat = &dev_priv->ppat;
> + u64 pat = 0;
> + int i;
> +
> + for (i = 0; i < ppat->max_entries; i++)
> + pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> + bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> + I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> + I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
While moving the code, lower_32_bits and upper_32_bits, it should really warn without them?
> +}
> +
> +#define gen8_pat_ca(v) ((v) & 0x3)
> +#define gen8_pat_tc(v) (((v) >> 2) & 0x3) #define gen8_pat_age(v)
> +(((v) >> 4) & 0x3)
Magic numbers. I'm also not 100% sure if KMD will want partial matches, or if GVT should pass in a scoring function.
> +
> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
> + unsigned int score = 0;
> +
> + /* Cache attribute has to be matched. */
> + if (gen8_pat_ca(src) != gen8_pat_ca(dst))
> + return 0;
> +
> + if (gen8_pat_age(src) == gen8_pat_age(dst))
> + score += 1;
> +
> + if (gen8_pat_tc(src) == gen8_pat_tc(dst))
> + score += 2;
I'd lift this check to have them all in importance order.
> +
> + if (score == 3)
> + return INTEL_PPAT_PERFECT_MATCH;
> +
> + return score;
> +}
> +
> +#define chv_get_snoop(v) (((v) >> 6) & 0x1)
Magic numbers, which need to be in i915_reg.h with #define. Should also be chv_pat_snoop();
> +
> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
> + if (chv_get_snoop(src) == chv_get_snoop(dst))
> + return INTEL_PPAT_PERFECT_MATCH;
> +
> + return 0;
return chv_pat_snoop(src) == chv_pat_snoop(dst) ?
INTEL_PPAT_PERFECT_MATCH : 0;
> +}
> +
> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
> + ppat->max_entries = 8;
> + ppat->update = cnl_private_pat_update;
> + ppat->match = bdw_private_pat_match;
> + ppat->dummy_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
"dummy_value" is not a very descriptive name.
> +
> /* XXX: spec is unclear if this is still needed for CNL+ */
> - if (!USES_PPGTT(dev_priv)) {
> - I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
> + if (!USES_PPGTT(ppat->i915)) {
> + alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC);
> return;
> }
>
> - I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
> - I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
> - I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> - I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
> - I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
> - I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
> - I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
> - I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
> + alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC);
> + alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> + alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC);
> + alloc_ppat_entry(ppat, 4, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
Maybe I'm not following at all, but this seems like quite a change to me? 0-3 used to be WB, WC, WT, UC, now 1 is unset, like is 5-7.
I'd leave all changes to registers to a later patch and leave this patch to do what the title says, "Introduce private PAT management".
> static void setup_private_pat(struct drm_i915_private *dev_priv) {
> + struct intel_ppat *ppat = &dev_priv->ppat;
> + int i;
> +
> + ppat->i915 = dev_priv;
> +
> + /* Load per-platform PPAT configurations */
This comment is again just taking space.
> if (INTEL_GEN(dev_priv) >= 10)
> - cnl_setup_private_ppat(dev_priv);
> + cnl_setup_private_ppat(ppat);
> else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> - chv_setup_private_ppat(dev_priv);
> + chv_setup_private_ppat(ppat);
> else
> - bdw_setup_private_ppat(dev_priv);
> + bdw_setup_private_ppat(ppat);
> +
> + GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
> +
> + if (!ppat->max_entries)
> + return;
I'm not sure why we have "!max_entries" checks going, we should have one GEM_BUG_ON at this point and in the rest you can assume it to be nonzero.
> +
> + /* Fill unused PPAT entries with dummy PPAT value */
> + for_each_clear_bit(i, ppat->used, ppat->max_entries) {
> + ppat->entries[i].value = ppat->dummy_value;
> + ppat->entries[i].ppat = ppat;
> + set_bit(i, ppat->dirty);
> + }
> +
> + /* Write the HW */
If the callback was named update_hw(), this comment would be unnecessary.
> + ppat->update(dev_priv);
> }
<SNIP>
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -536,6 +536,40 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
> return container_of(vm, struct i915_ggtt, base); }
>
> +#define INTEL_MAX_PPAT_ENTRIES 8
> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
> +
> +struct intel_ppat;
> +
> +struct intel_ppat_entry {
> + struct intel_ppat *ppat;
> + struct kref ref_count;
Just "ref", like elsewhere in code.
> + u8 value;
> +};
> +
> +struct intel_ppat {
> + struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
> + DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
> + DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
> +
This really goes with the previous group of variable. So no newline.
> + unsigned int max_entries;
> +
> + u8 dummy_value;
Better name, like "clear_value" and short description may be useful.
> + /*
> + * Return a score to show how two PPAT values match,
> + * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
> + */
> + unsigned int (*match)(u8 src, u8 dst);
> + /* Write the PPAT configuration into HW. */
> + void (*update)(struct drm_i915_private *i915);
> +
> + struct drm_i915_private *i915;
> +};
> +
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private
> +*i915,
Same here with type vs name.
> + u8 value);
> +void intel_ppat_put(const struct intel_ppat_entry *entry);
Regards, joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list