[Intel-gfx] [RFCv5 2/2] drm/i915: Introduce private PAT management

Wang, Zhi A zhi.a.wang at intel.com
Tue Aug 29 11:18:47 UTC 2017


Thanks Joonas! :)

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen at linux.intel.com] 
Sent: Tuesday, August 29, 2017 12:37 PM
To: Wang, Zhi A <zhi.a.wang at intel.com>; intel-gfx at lists.freedesktop.org; intel-gvt-dev at lists.freedesktop.org
Cc: chris at chris-wilson.co.uk; zhenyuw at linux.intel.com; Widawsky, Benjamin <benjamin.widawsky at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>
Subject: Re: [RFCv5 2/2] drm/i915: Introduce private PAT management

On Tue, 2017-08-29 at 16:00 +0800, Zhi Wang wrote:
> The private PAT management is to support PPAT entry manipulation. Two 
> APIs are introduced for dynamically managing PPAT entries: 
> intel_ppat_get and intel_ppat_put.
> 
> intel_ppat_get will search for an existing PPAT entry which perfectly 
> matches the required PPAT value. If not, it will try to allocate or 
> return a partially matched PPAT entry if there is any available PPAT 
> indexes or not.
> 
> intel_ppat_put will put back the PPAT entry which comes from 
> intel_ppat_get. If it's dynamically allocated, the reference count 
> will be decreased. If the reference count turns into zero, the PPAT 
> index is freed again.
> 
> Besides, another two callbacks are introduced to support the private 
> PAT management framework. One is ppat->update(), which writes the PPAT 
> configurations in ppat->entries into HW. Another one is ppat->match, 
> which will return a score to show how two PPAT values match with each other.
> 
> v5:
> 
> - Add check and warnnings for those platforms which don't have PPAT.
> 
> v3:
> 
> - Introduce dirty bitmap for PPAT registers. (Chris)
> - Change the name of the pointer "dev_priv" to "i915". (Chris)
> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. 
> (Chris)
> 
> v2:
> 
> - API re-design. (Chris)
> 
> Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Signed-off-by: Zhi Wang <zhi.a.wang at intel.com>

<SNIP>

> -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static struct intel_ppat_entry *alloc_ppat_entry(struct intel_ppat *ppat,
> +						 unsigned int index,
> +						 u8 value)
>  {
> +	struct intel_ppat_entry *entry = &ppat->entries[index];
> +
> +	entry->ppat = ppat;
> +	entry->value = value;
> +	kref_init(&entry->ref_count);
> +	set_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +
> +	return entry;
> +}
> +
> +static void free_ppat_entry(struct intel_ppat_entry *entry) {
> +	struct intel_ppat *ppat = entry->ppat;
> +	int index = entry - ppat->entries;
> +
> +	entry->value = ppat->dummy_value;
> +	clear_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +}

Above functions should be __ prefixed as they do no checking if they override existing data. The suitable names might be __ppat_get and __ppat_put.

> +
> +/**
> + * intel_ppat_get - get a usable PPAT entry
> + * @i915: i915 device instance
> + * @value: the PPAT value required by the caller
> + *
> + * The function tries to search if there is an existing PPAT entry 
> +which
> + * matches with the required value. If perfectly matched, the 
> +existing PPAT
> + * entry will be used. If only partially matched, it will try to 
> +check if
> + * there is any available PPAT index. If yes, it will allocate a new 
> +PPAT
> + * index for the required entry and update the HW. If not, the 
> +partially
> + * matched entry will be used.
> + */
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private 
> +*i915,

Maybe split the function type and name to avoid exceeding 80 chars on next line.

> +					      u8 value)
> +{
> +	struct intel_ppat *ppat = &i915->ppat;
> +	struct intel_ppat_entry *entry;
> +	int i, used;
> +	unsigned int score, best_score;
> +
> +	if (WARN_ON(!ppat->max_entries))
> +		return ERR_PTR(-ENODEV);

No need for extra check like this, this will just lead to ENOSPC.

> +
> +	score = best_score = 0;
> +	used = 0;

This variable behaves more like scanned.

> +
> +	/* First, find a suitable value from available entries */

The next two lines repeat this information, no need to document "what".

> +	for_each_set_bit(i, ppat->used, ppat->max_entries) {

		score could be declared in this scope.

> +		score = ppat->match(ppat->entries[i].value, value);
> +		/* Perfect match */

This comment is literally repeating what code says.

> +		if (score == INTEL_PPAT_PERFECT_MATCH) {
> +			entry = &ppat->entries[i];
> +			kref_get(&entry->ref_count);
> +			return entry;
> +		}
> +
> +		if (score > best_score) {
> +			entry = &ppat->entries[i];

Above could be simplified:

			if (score == INTEL_PPAT_PERFECT_MATCH) {
				kref_get(&entry->ref);
				return entry;
			}

> +			best_score = score;
> +		}
> +		used++;
> +	}
> +
> +	/* No matched entry and we can't allocate a new entry. */

DRM_ERROR replicates this comment's information.

> +	if (!best_score && used == ppat->max_entries) {
> +		DRM_ERROR("Fail to get PPAT entry\n");

DRM_DEBUG_DRIVER at most.

> +		return ERR_PTR(-ENOSPC);
> +	}
> +
> +	/*
> +	 * Found a matched entry which is not perfect,
> +	 * and we can't allocate a new entry.
> +	 */
> +	if (best_score && used == ppat->max_entries) {
> +		kref_get(&entry->ref_count);
> +		return entry;
> +	}

Above code could be combined:

	if (scanned == ppat->max_entries) {
		if(!best_score)
			return ERR_PTR(-ENOSPC);
		kref_get(&entry->ref);
		return entry;
	}

> +
> +	/* Allocate a new entry */

This comment is also just telling "what", which we can see from code.

> +	i = find_first_zero_bit(ppat->used, ppat->max_entries);
> +	entry = alloc_ppat_entry(ppat, i, value);
> +	ppat->update(i915);
> +	return entry;
> +}
> +
> +static void put_ppat(struct kref *kref)

ppat_release might cause less confusion, otherwise there will be 3 put functions.

> +{
> +	struct intel_ppat_entry *entry =
> +		container_of(kref, struct intel_ppat_entry, ref_count);
> +	struct drm_i915_private *i915 = entry->ppat->i915;
> +
> +	free_ppat_entry(entry);
> +	entry->ppat->update(i915);
> +}
> +
> +/**
> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
> + * @entry: an intel PPAT entry
> + *
> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT 
> +index of the
> + * entry is dynamically allocated, its reference count will be 
> +decreased. Once
> + * the reference count becomes into zero, the PPAT index becomes free again.
> + */
> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
> +	struct intel_ppat *ppat = entry->ppat;
> +	int index = entry - ppat->entries;
> +
> +	if (WARN_ON(!ppat->max_entries))
> +		return;

This is clearly a kernel programmer error (and a serious one, so could be GEM_BUG_ON).

> +
> +	kref_put(&ppat->entries[index].ref_count, put_ppat); }
> +
> +static void cnl_private_pat_update(struct drm_i915_private *dev_priv) 
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +		clear_bit(i, ppat->dirty);
> +		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);

Clearing the bit after write is the logical thing to do.

> +	}
> +}
> +
> +static void bdw_private_pat_update(struct drm_i915_private *dev_priv) 
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	u64 pat = 0;
> +	int i;
> +
> +	for (i = 0; i < ppat->max_entries; i++)
> +		pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> +	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);

While moving the code, lower_32_bits and upper_32_bits, it should really warn without them?

> +}
> +
> +#define gen8_pat_ca(v) ((v) & 0x3)
> +#define gen8_pat_tc(v) (((v) >> 2) & 0x3) #define gen8_pat_age(v) 
> +(((v) >> 4) & 0x3)

Magic numbers. I'm also not 100% sure if KMD will want partial matches, or if GVT should pass in a scoring function.

> +
> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
> +	unsigned int score = 0;
> +
> +	/* Cache attribute has to be matched. */
> +	if (gen8_pat_ca(src) != gen8_pat_ca(dst))
> +		return 0;
> +
> +	if (gen8_pat_age(src) == gen8_pat_age(dst))
> +		score += 1;
> +
> +	if (gen8_pat_tc(src) == gen8_pat_tc(dst))
> +		score += 2;

I'd lift this check to have them all in importance order.

> +
> +	if (score == 3)
> +		return INTEL_PPAT_PERFECT_MATCH;
> +
> +	return score;
> +}
> +
> +#define chv_get_snoop(v) (((v) >> 6) & 0x1)

Magic numbers, which need to be in i915_reg.h with #define. Should also be chv_pat_snoop();

> +
> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
> +	if (chv_get_snoop(src) == chv_get_snoop(dst))
> +		return INTEL_PPAT_PERFECT_MATCH;
> +
> +	return 0;

	return chv_pat_snoop(src) == chv_pat_snoop(dst) ?
	       INTEL_PPAT_PERFECT_MATCH : 0;

> +}
> +
> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
> +	ppat->max_entries = 8;
> +	ppat->update = cnl_private_pat_update;
> +	ppat->match = bdw_private_pat_match;
> +	ppat->dummy_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);

"dummy_value" is not a very descriptive name.

> +
>  	/* XXX: spec is unclear if this is still needed for CNL+ */
> -	if (!USES_PPGTT(dev_priv)) {
> -		I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
> +	if (!USES_PPGTT(ppat->i915)) {
> +		alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC);
>  		return;
>  	}
>  
> -	I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
> -	I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
> -	I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
> -	I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
> -	I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
> -	I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
> +	alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC);
> +	alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> +	alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC);
> +	alloc_ppat_entry(ppat, 4, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));

Maybe I'm not following at all, but this seems like quite a change to me? 0-3 used to be WB, WC, WT, UC, now 1 is unset, like is 5-7.

I'd leave all changes to registers to a later patch and leave this patch to do what the title says, "Introduce private PAT management".

>  static void setup_private_pat(struct drm_i915_private *dev_priv)  {
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	ppat->i915 = dev_priv;
> +
> +	/* Load per-platform PPAT configurations */

This comment is again just taking space.

>  	if (INTEL_GEN(dev_priv) >= 10)
> -		cnl_setup_private_ppat(dev_priv);
> +		cnl_setup_private_ppat(ppat);
>  	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> -		chv_setup_private_ppat(dev_priv);
> +		chv_setup_private_ppat(ppat);
>  	else
> -		bdw_setup_private_ppat(dev_priv);
> +		bdw_setup_private_ppat(ppat);
> +
> +	GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
> +
> +	if (!ppat->max_entries)
> +		return;

I'm not sure why we have "!max_entries" checks going, we should have one GEM_BUG_ON at this point and in the rest you can assume it to be nonzero.

> +
> +	/* Fill unused PPAT entries with dummy PPAT value */
> +	for_each_clear_bit(i, ppat->used, ppat->max_entries) {
> +		ppat->entries[i].value = ppat->dummy_value;
> +		ppat->entries[i].ppat = ppat;
> +		set_bit(i, ppat->dirty);
> +	}
> +
> +	/* Write the HW */

If the callback was named update_hw(), this comment would be unnecessary.

> +	ppat->update(dev_priv);
>  }

<SNIP>

> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -536,6 +536,40 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>  	return container_of(vm, struct i915_ggtt, base);  }
>  
> +#define INTEL_MAX_PPAT_ENTRIES 8
> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
> +
> +struct intel_ppat;
> +
> +struct intel_ppat_entry {
> +	struct intel_ppat *ppat;
> +	struct kref ref_count;

	Just "ref", like elsewhere in code.

> +	u8 value;
> +};
> +
> +struct intel_ppat {
> +	struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
> +	DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
> +	DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
> +

This really goes with the previous group of variable. So no newline.

> +	unsigned int max_entries;
> +
> +	u8 dummy_value;

Better name, like "clear_value" and short description may be useful.

> +	/*
> +	 * Return a score to show how two PPAT values match,
> +	 * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
> +	 */
> +	unsigned int (*match)(u8 src, u8 dst);
> +	/* Write the PPAT configuration into HW. */
> +	void (*update)(struct drm_i915_private *i915);
> +
> +	struct drm_i915_private *i915;
> +};
> +
> +const struct intel_ppat_entry *intel_ppat_get(struct drm_i915_private 
> +*i915,

Same here with type vs name.

> +					      u8 value);
> +void intel_ppat_put(const struct intel_ppat_entry *entry);

Regards, joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


More information about the Intel-gfx mailing list