[Intel-gfx] [PATCH] drm/i915: Support to enable TRTT on GEN9

Chris Wilson chris at chris-wilson.co.uk
Sun Jan 10 09:39:13 PST 2016


On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
> 
> Gen9 has an additional address translation hardware support in form of
> Tiled Resource Translation Table (TR-TT) which provides an extra level
> of abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
> 
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical address,
> so the output of translation from TR-TT will be a PPGTT offset.
> 
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
> each level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
> 
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full control
> on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.

> The assumption here is that UMD only will do the complete PPGTT address space
> management and use the Soft Pin API for all the buffer objects associated with
> a given Context.

That is a poor assumption, and not one required for this to work.

> So UMD will also have to allocate the L3/L2/L1 table pages
> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
> program the relevant entries of L3/L2/L1 tables.

This only applies to te TR-TT L1-L3 cache, right?

> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
> the holes in the Sparse texture resource will be set with the Null tile pattern
> The improper programming of TRTT should only lead to a recoverable GPU hang,
> eventually leading to banning of the culprit context without victimizing others.
> 
> The association of any Sparse resource with the BOs will be known only to UMD,
> and only the Sparse resources shall be assigned an offset from the TR-TT segment
> by UMD. The use of TR-TT segment or mapping of Sparse resources will be
> abstracted from the KMD,

s/abstracted from/transparent to/ s/,/;/

> UMD can do the address assignment from TR-TT segment

s/can/will/

> autonomously and KMD will be oblivious of it.
> The BOs must not be assigned an address from TR-TT segment, they will be mapped

s/The BOs/Any object/

> to PPGTT in a regular way by KMD

s/using the Soft Pin offset provided by UMD// as this is irrelevant.

> This patch provides an interface through which UMD can convey KMD to enable
> TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
> added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page,

+along with the

> pattern value for the
> Null & invalid Tile registers.
> 
> Testcase: igt/gem_trtt
> 
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |  3 ++
>  drivers/gpu/drm/i915/i915_drv.h         | 12 +++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.c     | 57 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h     |  6 ++++
>  drivers/gpu/drm/i915/i915_reg.h         | 19 +++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 41 ++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h             |  8 +++++
>  8 files changed, 191 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 988a380..c247c25 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_HAS_EXEC_SOFTPIN:
>  		value = 1;
>  		break;
> +	case I915_PARAM_HAS_TRTT:
> +		value = HAS_TRTT(dev);
> +		break;

Should we do this here, or just query the context? In fact you are
missing the context getparam path any way.

>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c6dd4db..12c612e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
>  #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT   (1<<1)

Make flags unsigned whilst you are here, and fix the holes!

>  /**
>   * struct intel_context - as the name implies, represents a context.
>   * @ref: reference count.
> @@ -881,6 +882,15 @@ struct intel_context {
>  		int pin_count;
>  	} engine[I915_NUM_RINGS];
>  
> +	/* TRTT info */
> +	struct {

Give this a name now, we will be thankful in the future.

> +		uint32_t invd_tile_val;
> +		uint32_t null_tile_val;
> +		uint64_t l3_table_address;
> +		struct i915_vma *vma;
> +		bool update_trtt_params;
> +	} trtt_info;
> +
>  	struct list_head link;
>  };
>  
> @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table {
>  				 !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
>  				 !IS_BROXTON(dev))
>  
> +#define HAS_TRTT(dev)		(IS_GEN9(dev))
> +
>  #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 900ffd0..ae9fc34 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context *ctx)
>  		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
>  			break;
>  	}
> +
> +	if (ctx->flags & CONTEXT_USE_TRTT)
> +		i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);

Sould be in context free.

>  }
>  
>  void i915_gem_context_free(struct kref *ctx_ref)
> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>  	return ctx;
>  }
>  
> +static int
> +i915_setup_trtt_ctx(struct intel_context *ctx,
> +		    struct drm_i915_gem_context_trtt_param *trtt_params)
> +{
> +	if (ctx->flags & CONTEXT_USE_TRTT)
> +		return -EEXIST;
> +
> +	/* basic sanity checks for the l3 table pointer */
> +	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
> +	    (ctx->trtt_info.l3_table_address <
> +			(GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))

Presumably l3_table has an actual size and you want to do a range
overlap test, not just the start address.

> +		return -EINVAL;
> +
> +	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
> +		return -EINVAL;

These are worth adding DRM_DEBUG() or even better start using dev_debug()
so that we can debug userspace startup issues.

> @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  {
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct drm_i915_gem_context_param *args = data;
> +	struct drm_i915_gem_context_trtt_param trtt_params;
>  	struct intel_context *ctx;
>  	int ret;
>  
> @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>  			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
>  		}
>  		break;
> +	case I915_CONTEXT_PARAM_ENABLE_TRTT:

Bump this case to i915_setup_trtt_ctx.
i.e. just have
	ret = i915_setup_trtt_ctx(ctx, args);
	break;

Otherwise this function will become very unwieldly very quickly.

>  int i915_ppgtt_init_hw(struct drm_device *dev)
>  {
> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> +		struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> +			   GEN9_TRTT_BYPASS_DISABLE);

Shouldn't this be a context specific register? In which case you need to
set it in the context image instead.

Hmm. given you already do the context image tweaks, how does work with
non-trtt contexts?

> +struct i915_vma *
> +i915_gem_setup_trtt_vma(struct i915_address_space *vm)
> +{
> +	struct i915_vma *vma;
> +	int ret;
> +
> +	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> +	if (vma == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&vma->vma_link);
> +	INIT_LIST_HEAD(&vma->mm_list);
> +	INIT_LIST_HEAD(&vma->exec_list);
> +	vma->vm = vm;
> +	i915_ppgtt_get(i915_vm_to_ppgtt(vm));

Tempted to write a patch to allow

	vma->vm = i915_ppggtt_get(i915_vm_to_ppgtt(vm));
?

> +	/* Mark the vma as perennially pinned */

s/perennially/permanently/

We don't want to lose the reservation as opposed to having it grow back
next year.

> +	vma->pin_count = 1;
> +
> +	/* Reserve from the 48 bit PPGTT space */
> +	vma->node.start = GEN9_TRTT_SEGMENT_START;
> +	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> +	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> +	if (ret) {
> +		ret = i915_gem_evict_for_vma(vma);
> +		if (ret == 0)
> +			ret = drm_mm_reserve_node(&vm->mm, &vma->node);

Good. I think we want i915_vm_reserve_node(vm, START, SIZE, &vma) - but
have a look at the other callsites to see if we have a common interface.
Looks like this would improve i915_vgpu.

> +struct drm_i915_gem_context_trtt_param {
> +	__u64 l3_table_address;
> +	__u32 invd_tile_val;
> +	__u32 null_tile_val;
> +};

Passes the ABI structure sanity checks.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list