[Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9

Gore, Tim tim.gore at intel.com
Thu Mar 24 16:29:20 UTC 2016


Tim GoreĀ 
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> Of akash.goel at intel.com
> Sent: Tuesday, March 22, 2016 8:43 AM
> To: intel-gfx at lists.freedesktop.org
> Cc: Goel, Akash
> Subject: [Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9
> 
> From: Akash Goel <akash.goel at intel.com>
> 
> Gen9 has an additional address translation hardware support in form of Tiled
> Resource Translation Table (TR-TT) which provides an extra level of
> abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
> 
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical
> address, so the output of translation from TR-TT will be a PPGTT offset.
> 
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves
> behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each
> level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
> 
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full
> control on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.
> UMD will have to allocate the L3/L2/L1 table pages as a regular BO only &
> assign them a PPGTT address through the Soft Pin API (for example, use soft
> pin to assign l3_table_address to the L3 table BO, when used).
> UMD will also program the entries in the TR-TT page tables using regular
> batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
> table BOs.
> UMD may do the complete PPGTT address space management, on the
> pretext that it could help minimize the conflicts.
> 
> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
> the holes in the Sparse texture resource will be set with the Null tile pattern
> The improper programming of TRTT should only lead to a recoverable GPU
> hang, eventually leading to banning of the culprit context without victimizing
> others.
> 
> The association of any Sparse resource with the BOs will be known only to
> UMD, and only the Sparse resources shall be assigned an offset from the TR-
> TT segment by UMD. The use of TR-TT segment or mapping of Sparse
> resources will be transparent to the KMD, UMD will do the address
> assignment from TR-TT segment autonomously and KMD will be oblivious of
> it.
> Any object must not be assigned an address from TR-TT segment, they will
> be mapped to PPGTT in a regular way by KMD.
> 
> This patch provides an interface through which UMD can convey KMD to
> enable TR-TT for a given context. A new I915_CONTEXT_PARAM_TRTT param
> has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page, start location of TR-
> TT segment alongwith the pattern value for the Null & invalid Tile registers.
> 
> v2:
>  - Support context_getparam for TRTT also and dispense with a separate
>    GETPARAM case for TRTT (Chris).
>  - Use i915_dbg to log errors for the invalid TRTT ABI parameters passed
>    from user space (Chris).
>  - Move all the argument checking for TRTT in context_setparam to the
>    set_trtt function (Chris).
>  - Change the type of 'flags' field inside 'intel_context' to unsigned (Chris)
>  - Rename certain functions to rightly reflect their purpose, rename
>    the new param for TRTT in gem_context_param to
> I915_CONTEXT_PARAM_TRTT,
>    rephrase few lines in the commit message body, add more comments
> (Chris).
>  - Extend ABI to allow User specify TRTT segment location also.
>  - Fix for selective enabling of TRTT on per context basis, explicitly
>    disable TR-TT at the start of a new context.
> 
> v3:
>  - Check the return value of gen9_emit_trtt_regs (Chris)
>  - Update the kernel doc for intel_context structure.
>  - Rebased.
> 
> v4:
>  - Fix the warnings reported by 'checkpatch.pl --strict' (Michel)
>  - Fix the context_getparam implementation avoiding the reset of size field,
>    affecting the TRTT case.
> 
> v5:
>  - Update the TR-TT params right away in context_setparam, by constructing
>    & submitting a request emitting LRIs, instead of deferring it and
>    conflating with the next batch submission (Chris)
>  - Follow the struct_mutex handling related prescribed rules, while accessing
>    User space buffer, both in context_setparam & getparam functions (Chris).
> 
> v6:
>  - Fix the warning caused due to removal of un-allocated trtt vma node.
> 
> v7:
>  - Move context ref/unref to context_setparam_ioctl from set_trtt() &
> remove
>    that from get_trtt() as not really needed there (Chris).
>  - Add a check for improper values for Null & Invalid Tiles.
>  - Remove superfluous DRM_ERROR from trtt_context_allocate_vma (Chris).
>  - Rebased.
> 
> v8:
>  - Add context ref/unref to context_getparam_ioctl also so as to be
> consistent
>    and ease the extension of ioctl in future (Chris)
> 
> v9:
>  - Fix the handling of return value from trtt_context_allocate_vma() function,
>    causing kernel panic at the time of destroying context, in case of
>    unsuccessful allocation of trtt vma.
>  - Rebased.
> 
> Testcase: igt/gem_trtt
> 
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  16 +++-
>  drivers/gpu/drm/i915/i915_gem_context.c | 157
> +++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem_gtt.c     |  65 +++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h     |   8 ++
>  drivers/gpu/drm/i915/i915_reg.h         |  19 ++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 124 ++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.h        |   1 +
>  include/uapi/drm/i915_drm.h             |   8 ++
>  8 files changed, 393 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -804,6 +804,7 @@ struct i915_ctx_hang_stats {  #define
> DEFAULT_CONTEXT_HANDLE 0
> 
>  #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT   (1 << 1)
>  /**
>   * struct intel_context - as the name implies, represents a context.
>   * @ref: reference count.
> @@ -818,6 +819,8 @@ struct i915_ctx_hang_stats {
>   * @ppgtt: virtual memory space used by this context.
>   * @legacy_hw_ctx: render context backing object and whether it is
> correctly
>   *                initialized (legacy ring submission mechanism only).
> + * @trtt_info: Programming parameters for tr-tt (redirection tables for
> + *             userspace, for sparse resource management)
>   * @link: link in the global list of contexts.
>   *
>   * Contexts are memory images used by the hardware to store copies of
> their @@ -828,7 +831,7 @@ struct intel_context {
>  	int user_handle;
>  	uint8_t remap_slice;
>  	struct drm_i915_private *i915;
> -	int flags;
> +	unsigned int flags;
>  	struct drm_i915_file_private *file_priv;
>  	struct i915_ctx_hang_stats hang_stats;
>  	struct i915_hw_ppgtt *ppgtt;
> @@ -849,6 +852,15 @@ struct intel_context {
>  		uint32_t *lrc_reg_state;
>  	} engine[I915_NUM_ENGINES];
> 
> +	/* TRTT info */
> +	struct intel_context_trtt {
> +		u32 invd_tile_val;
> +		u32 null_tile_val;
> +		u64 l3_table_address;
> +		u64 segment_base_addr;
> +		struct i915_vma *vma;
> +	} trtt_info;
> +
>  	struct list_head link;
>  };
> 
> @@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table {
>  				 !IS_VALLEYVIEW(dev) &&
> !IS_CHERRYVIEW(dev) && \
>  				 !IS_BROXTON(dev))
> 
> +#define HAS_TRTT(dev)		(IS_GEN9(dev))
> +

A very minor point, but there is a w/a to disable TRTT for BXT_REVID_A0/1. I realise this
is basically obsolete now, but I'm still using one!

>  #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 394e525..5f28c23 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -133,6 +133,14 @@ static int get_context_size(struct drm_device *dev)
>  	return ret;
>  }
> 
> +static void intel_context_free_trtt(struct intel_context *ctx) {
> +	if (!ctx->trtt_info.vma)
> +		return;
> +
> +	intel_trtt_context_destroy_vma(ctx->trtt_info.vma);
> +}
> +
>  static void i915_gem_context_clean(struct intel_context *ctx)  {
>  	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; @@ -164,6 +172,8 @@
> void i915_gem_context_free(struct kref *ctx_ref)
>  	 */
>  	i915_gem_context_clean(ctx);
> 
> +	intel_context_free_trtt(ctx);
> +
>  	i915_ppgtt_put(ctx->ppgtt);
> 
>  	if (ctx->legacy_hw_ctx.rcs_state)
> @@ -507,6 +517,129 @@ i915_gem_context_get(struct
> drm_i915_file_private *file_priv, u32 id)
>  	return ctx;
>  }
> 
> +static int
> +intel_context_get_trtt(struct intel_context *ctx,
> +		       struct drm_i915_gem_context_param *args) {
> +	struct drm_i915_gem_context_trtt_param trtt_params;
> +	struct drm_device *dev = ctx->i915->dev;
> +
> +	if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) {
> +		return -ENODEV;
> +	} else if (args->size < sizeof(trtt_params)) {
> +		args->size = sizeof(trtt_params);
> +	} else {
> +		trtt_params.segment_base_addr =
> +			ctx->trtt_info.segment_base_addr;
> +		trtt_params.l3_table_address =
> +			ctx->trtt_info.l3_table_address;
> +		trtt_params.null_tile_val =
> +			ctx->trtt_info.null_tile_val;
> +		trtt_params.invd_tile_val =
> +			ctx->trtt_info.invd_tile_val;
> +
> +		mutex_unlock(&dev->struct_mutex);
> +
> +		if (__copy_to_user(to_user_ptr(args->value),
> +				   &trtt_params,
> +				   sizeof(trtt_params))) {
> +			mutex_lock(&dev->struct_mutex);
> +			return -EFAULT;
> +		}
> +
> +		args->size = sizeof(trtt_params);
> +		mutex_lock(&dev->struct_mutex);
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +intel_context_set_trtt(struct intel_context *ctx,
> +		       struct drm_i915_gem_context_param *args) {
> +	struct drm_i915_gem_context_trtt_param trtt_params;
> +	struct i915_vma *vma;
> +	struct drm_device *dev = ctx->i915->dev;
> +	int ret;
> +
> +	if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev))
> +		return -ENODEV;
> +	else if (ctx->flags & CONTEXT_USE_TRTT)
> +		return -EEXIST;
> +	else if (args->size < sizeof(trtt_params))
> +		return -EINVAL;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	if (copy_from_user(&trtt_params,
> +			   to_user_ptr(args->value),
> +			   sizeof(trtt_params))) {
> +		mutex_lock(&dev->struct_mutex);
> +		ret = -EFAULT;
> +		goto exit;
> +	}
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +	/* Check if the setup happened from another path */
> +	if (ctx->flags & CONTEXT_USE_TRTT) {
> +		ret = -EEXIST;
> +		goto exit;
> +	}
> +
> +	/* basic sanity checks for the segment location & l3 table pointer */
> +	if (trtt_params.segment_base_addr & (GEN9_TRTT_SEGMENT_SIZE -
> 1)) {
> +		i915_dbg(dev, "segment base address not correctly
> aligned\n");
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +
> +	if (((trtt_params.l3_table_address + PAGE_SIZE) >=
> +	     trtt_params.segment_base_addr) &&
> +	    (trtt_params.l3_table_address <
> +		    (trtt_params.segment_base_addr +
> GEN9_TRTT_SEGMENT_SIZE))) {
> +		i915_dbg(dev, "l3 table address conflicts with trtt
> segment\n");
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +
> +	if (trtt_params.l3_table_address &
> ~GEN9_TRTT_L3_GFXADDR_MASK) {
> +		i915_dbg(dev, "invalid l3 table address\n");
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +
> +	if (trtt_params.null_tile_val == trtt_params.invd_tile_val) {
> +		i915_dbg(dev, "incorrect values for null & invalid tiles\n");
> +		return -EINVAL;
> +	}
> +
> +	vma = intel_trtt_context_allocate_vma(&ctx->ppgtt->base,
> +					trtt_params.segment_base_addr);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto exit;
> +	}
> +
> +	ctx->trtt_info.vma = vma;
> +	ctx->trtt_info.null_tile_val = trtt_params.null_tile_val;
> +	ctx->trtt_info.invd_tile_val = trtt_params.invd_tile_val;
> +	ctx->trtt_info.l3_table_address = trtt_params.l3_table_address;
> +	ctx->trtt_info.segment_base_addr =
> trtt_params.segment_base_addr;
> +
> +	ret = intel_lr_rcs_context_setup_trtt(ctx);
> +	if (ret) {
> +		intel_trtt_context_destroy_vma(ctx->trtt_info.vma);
> +		goto exit;
> +	}
> +
> +	ctx->flags |= CONTEXT_USE_TRTT;
> +
> +exit:
> +	return ret;
> +}
> +
>  static inline int
>  mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)  { @@ -
> 931,7 +1064,14 @@ int i915_gem_context_getparam_ioctl(struct drm_device
> *dev, void *data,
>  		return PTR_ERR(ctx);
>  	}
> 
> -	args->size = 0;
> +	/*
> +	 * Take a reference also, as in certain cases we have to release &
> +	 * reacquire the struct_mutex and we don't want the context to
> +	 * go away.
> +	 */
> +	i915_gem_context_reference(ctx);
> +
> +	args->size = (args->param != I915_CONTEXT_PARAM_TRTT) ? 0 :
> +args->size;
>  	switch (args->param) {
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  		args->value = ctx->hang_stats.ban_period_seconds;
> @@ -947,10 +1087,14 @@ int i915_gem_context_getparam_ioctl(struct
> drm_device *dev, void *data,
>  		else
>  			args->value = to_i915(dev)->ggtt.base.total;
>  		break;
> +	case I915_CONTEXT_PARAM_TRTT:
> +		ret = intel_context_get_trtt(ctx, args);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
>  	}
> +	i915_gem_context_unreference(ctx);
>  	mutex_unlock(&dev->struct_mutex);
> 
>  	return ret;
> @@ -974,6 +1118,13 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
>  		return PTR_ERR(ctx);
>  	}
> 
> +	/*
> +	 * Take a reference also, as in certain cases we have to release &
> +	 * reacquire the struct_mutex and we don't want the context to
> +	 * go away.
> +	 */
> +	i915_gem_context_reference(ctx);
> +
>  	switch (args->param) {
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  		if (args->size)
> @@ -992,10 +1143,14 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
>  			ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP :
> 0;
>  		}
>  		break;
> +	case I915_CONTEXT_PARAM_TRTT:
> +		ret = intel_context_set_trtt(ctx, args);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
>  	}
> +	i915_gem_context_unreference(ctx);
>  	mutex_unlock(&dev->struct_mutex);
> 
>  	return ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0715bb7..cbf8a03 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev)  {
>  	gtt_write_workarounds(dev);
> 
> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> +		struct drm_i915_private *dev_priv = dev->dev_private;
> +		/*
> +		 * Globally enable TR-TT support in Hw.
> +		 * Still TR-TT enabling on per context basis is required.
> +		 * Non-trtt contexts are not affected by this setting.
> +		 */
> +		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> +			   GEN9_TRTT_BYPASS_DISABLE);
> +	}
> +
>  	/* In the case of execlists, PPGTT is enabled by the context
> descriptor
>  	 * and the PDPs are contained within the context itself.  We don't
>  	 * need to do anything here. */
> @@ -3362,6 +3373,60 @@
> i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object
> *obj,
> 
>  }
> 
> +void intel_trtt_context_destroy_vma(struct i915_vma *vma) {
> +	struct i915_address_space *vm = vma->vm;
> +
> +	WARN_ON(!list_empty(&vma->obj_link));
> +	WARN_ON(!list_empty(&vma->vm_link));
> +	WARN_ON(!list_empty(&vma->exec_list));
> +
> +	WARN_ON(!vma->pin_count);
> +
> +	if (drm_mm_node_allocated(&vma->node))
> +		drm_mm_remove_node(&vma->node);
> +
> +	i915_ppgtt_put(i915_vm_to_ppgtt(vm));
> +	kmem_cache_free(to_i915(vm->dev)->vmas, vma); }
> +
> +struct i915_vma *
> +intel_trtt_context_allocate_vma(struct i915_address_space *vm,
> +				uint64_t segment_base_addr)
> +{
> +	struct i915_vma *vma;
> +	int ret;
> +
> +	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> +	if (!vma)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&vma->obj_link);
> +	INIT_LIST_HEAD(&vma->vm_link);
> +	INIT_LIST_HEAD(&vma->exec_list);
> +	vma->vm = vm;
> +	i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> +
> +	/* Mark the vma as permanently pinned */
> +	vma->pin_count = 1;
> +
> +	/* Reserve from the 48 bit PPGTT space */
> +	vma->node.start = segment_base_addr;
> +	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> +	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> +	if (ret) {
> +		ret = i915_gem_evict_for_vma(vma);
> +		if (ret == 0)
> +			ret = drm_mm_reserve_node(&vm->mm, &vma-
> >node);
> +	}
> +	if (ret) {
> +		intel_trtt_context_destroy_vma(vma);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return vma;
> +}
> +
>  static struct scatterlist *
>  rotate_pages(const dma_addr_t *in, unsigned int offset,
>  	     unsigned int width, unsigned int height, diff --git
> a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d804be0..8cbaca2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t;
>  #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
>  #define GEN8_PPAT(i, x)			((uint64_t) (x) << ((i) * 8))
> 
> +/* Fixed size segment */
> +#define GEN9_TRTT_SEG_SIZE_SHIFT	44
> +#define GEN9_TRTT_SEGMENT_SIZE		(1ULL <<
> GEN9_TRTT_SEG_SIZE_SHIFT)
> +
>  enum i915_ggtt_view_type {
>  	I915_GGTT_VIEW_NORMAL = 0,
>  	I915_GGTT_VIEW_ROTATED,
> @@ -560,4 +564,8 @@ size_t
>  i915_ggtt_view_size(struct drm_i915_gem_object *obj,
>  		    const struct i915_ggtt_view *view);
> 
> +struct i915_vma *
> +intel_trtt_context_allocate_vma(struct i915_address_space *vm,
> +				uint64_t segment_base_addr);
> +void intel_trtt_context_destroy_vma(struct i915_vma *vma);
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
> reg)
>  #define   GEN8_RPCS_EU_MIN_SHIFT	0
>  #define   GEN8_RPCS_EU_MIN_MASK		(0xf <<
> GEN8_RPCS_EU_MIN_SHIFT)
> 
> +#define GEN9_TR_CHICKEN_BIT_VECTOR	_MMIO(0x4DFC)
> +#define   GEN9_TRTT_BYPASS_DISABLE	(1 << 0)
> +
> +/* TRTT registers in the H/W Context */
> +#define GEN9_TRTT_L3_POINTER_DW0	_MMIO(0x4DE0)
> +#define GEN9_TRTT_L3_POINTER_DW1	_MMIO(0x4DE4)
> +#define   GEN9_TRTT_L3_GFXADDR_MASK	0xFFFFFFFF0000
> +
> +#define GEN9_TRTT_NULL_TILE_REG		_MMIO(0x4DE8)
> +#define GEN9_TRTT_INVD_TILE_REG		_MMIO(0x4DEC)
> +
> +#define GEN9_TRTT_VA_MASKDATA		_MMIO(0x4DF0)
> +#define   GEN9_TRVA_MASK_VALUE		0xF0
> +#define   GEN9_TRVA_DATA_MASK		0xF
> +
> +#define GEN9_TRTT_TABLE_CONTROL		_MMIO(0x4DF4)
> +#define   GEN9_TRTT_IN_GFX_VA_SPACE	(1 << 1)
> +#define   GEN9_TRTT_ENABLE		(1 << 0)
> +
>  #define GAM_ECOCHK			_MMIO(0x4090)
>  #define   BDW_DISABLE_HDC_INVALIDATION	(1<<25)
>  #define   ECOCHK_SNB_BIT		(1<<10)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 3a23b95..8af480b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct
> intel_engine_cs *engine)
>  	return init_workarounds_ring(engine);
>  }
> 
> +static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req)
> +{
> +	struct intel_ringbuffer *ringbuf = req->ringbuf;
> +	int ret;
> +
> +	ret = intel_logical_ring_begin(req, 2 + 2);
> +	if (ret)
> +		return ret;
> +
> +	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
> +
> +	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
> +	intel_logical_ring_emit(ringbuf, 0);
> +
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
> +static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) {
> +	struct intel_context *ctx = req->ctx;
> +	struct intel_ringbuffer *ringbuf = req->ringbuf;
> +	u64 masked_l3_gfx_address =
> +		ctx->trtt_info.l3_table_address &
> GEN9_TRTT_L3_GFXADDR_MASK;
> +	u32 trva_data_value =
> +		(ctx->trtt_info.segment_base_addr >>
> GEN9_TRTT_SEG_SIZE_SHIFT) &
> +		GEN9_TRVA_DATA_MASK;
> +	const int num_lri_cmds = 6;
> +	int ret;
> +
> +	/*
> +	 * Emitting LRIs to update the TRTT registers is most reliable, instead
> +	 * of directly updating the context image, as this will ensure that
> +	 * update happens in a serialized manner for the context and also
> +	 * lite-restore scenario will get handled.
> +	 */
> +	ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
> +	if (ret)
> +		return ret;
> +
> +	intel_logical_ring_emit(ringbuf,
> MI_LOAD_REGISTER_IMM(num_lri_cmds));
> +
> +	intel_logical_ring_emit_reg(ringbuf,
> GEN9_TRTT_L3_POINTER_DW0);
> +	intel_logical_ring_emit(ringbuf,
> +lower_32_bits(masked_l3_gfx_address));
> +
> +	intel_logical_ring_emit_reg(ringbuf,
> GEN9_TRTT_L3_POINTER_DW1);
> +	intel_logical_ring_emit(ringbuf,
> +upper_32_bits(masked_l3_gfx_address));
> +
> +	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG);
> +	intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val);
> +
> +	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG);
> +	intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val);
> +
> +	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA);
> +	intel_logical_ring_emit(ringbuf,
> +				GEN9_TRVA_MASK_VALUE |
> trva_data_value);
> +
> +	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
> +	intel_logical_ring_emit(ringbuf,
> +				GEN9_TRTT_IN_GFX_VA_SPACE |
> GEN9_TRTT_ENABLE);
> +
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
>  static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
> {
>  	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6
> +2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request
> *req)
>  	return intel_lr_context_render_state_init(req);
>  }
> 
> +static int gen9_init_rcs_context(struct drm_i915_gem_request *req) {
> +	int ret;
> +
> +	/*
> +	 * Explictily disable TR-TT at the start of a new context.
> +	 * Otherwise on switching from a TR-TT context to a new Non TR-TT
> +	 * context the TR-TT settings of the outgoing context could get
> +	 * spilled on to the new incoming context as only the Ring Context
> +	 * part is loaded on the first submission of a new context, due to
> +	 * the setting of ENGINE_CTX_RESTORE_INHIBIT bit.
> +	 */
> +	ret = gen9_init_rcs_context_trtt(req);
> +	if (ret)
> +		return ret;
> +
> +	return gen8_init_rcs_context(req);
> +}
> +
>  /**
>   * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
>   *
> @@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct
> drm_device *dev)
>  	logical_ring_default_vfuncs(dev, engine);
> 
>  	/* Override some for render ring. */
> -	if (INTEL_INFO(dev)->gen >= 9)
> +	if (INTEL_INFO(dev)->gen >= 9) {
>  		engine->init_hw = gen9_init_render_ring;
> -	else
> +		engine->init_context = gen9_init_rcs_context;
> +	} else {
>  		engine->init_hw = gen8_init_render_ring;
> -	engine->init_context = gen8_init_rcs_context;
> +		engine->init_context = gen8_init_rcs_context;
> +	}
> +
>  	engine->cleanup = intel_fini_pipe_control;
>  	engine->emit_flush = gen8_emit_flush_render;
>  	engine->emit_request = gen8_emit_request_render; @@ -2702,3
> +2794,29 @@ void intel_lr_context_reset(struct drm_device *dev,
>  		ringbuf->tail = 0;
>  	}
>  }
> +
> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) {
> +	struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]);
> +	struct drm_i915_gem_request *req;
> +	int ret;
> +
> +	if (!ctx->engine[RCS].state) {
> +		ret = intel_lr_context_deferred_alloc(ctx, engine);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	req = i915_gem_request_alloc(engine, ctx);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	ret = gen9_emit_trtt_regs(req);
> +	if (ret) {
> +		i915_gem_request_cancel(req);
> +		return ret;
> +	}
> +
> +	i915_add_request(req);
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> b/drivers/gpu/drm/i915/intel_lrc.h
> index a17cb12..f3600b2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev,
>  			struct intel_context *ctx);
>  uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
>  				     struct intel_engine_cs *engine);
> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx);
> 
>  u32 intel_execlists_ctx_id(struct intel_context *ctx,
>  			   struct intel_engine_cs *engine);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a5524cc..604da23 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param {
>  #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
>  #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
>  #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
> +#define I915_CONTEXT_PARAM_TRTT		0x4
>  	__u64 value;
>  };
> 
> +struct drm_i915_gem_context_trtt_param {
> +	__u64 segment_base_addr;
> +	__u64 l3_table_address;
> +	__u32 invd_tile_val;
> +	__u32 null_tile_val;
> +};
> +
>  #endif /* _UAPI_I915_DRM_H_ */
> --
> 1.9.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list