[Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9
Gore, Tim
tim.gore at intel.com
Thu Mar 24 16:29:20 UTC 2016
Tim GoreĀ
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> Of akash.goel at intel.com
> Sent: Tuesday, March 22, 2016 8:43 AM
> To: intel-gfx at lists.freedesktop.org
> Cc: Goel, Akash
> Subject: [Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9
>
> From: Akash Goel <akash.goel at intel.com>
>
> Gen9 has an additional address translation hardware support in form of Tiled
> Resource Translation Table (TR-TT) which provides an extra level of
> abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
>
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical
> address, so the output of translation from TR-TT will be a PPGTT offset.
>
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves
> behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each
> level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
>
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full
> control on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.
> UMD will have to allocate the L3/L2/L1 table pages as a regular BO only &
> assign them a PPGTT address through the Soft Pin API (for example, use soft
> pin to assign l3_table_address to the L3 table BO, when used).
> UMD will also program the entries in the TR-TT page tables using regular
> batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
> table BOs.
> UMD may do the complete PPGTT address space management, on the
> pretext that it could help minimize the conflicts.
>
> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
> the holes in the Sparse texture resource will be set with the Null tile pattern
> The improper programming of TRTT should only lead to a recoverable GPU
> hang, eventually leading to banning of the culprit context without victimizing
> others.
>
> The association of any Sparse resource with the BOs will be known only to
> UMD, and only the Sparse resources shall be assigned an offset from the TR-
> TT segment by UMD. The use of TR-TT segment or mapping of Sparse
> resources will be transparent to the KMD, UMD will do the address
> assignment from TR-TT segment autonomously and KMD will be oblivious of
> it.
> Any object must not be assigned an address from TR-TT segment, they will
> be mapped to PPGTT in a regular way by KMD.
>
> This patch provides an interface through which UMD can convey KMD to
> enable TR-TT for a given context. A new I915_CONTEXT_PARAM_TRTT param
> has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page, start location of TR-
> TT segment alongwith the pattern value for the Null & invalid Tile registers.
>
> v2:
> - Support context_getparam for TRTT also and dispense with a separate
> GETPARAM case for TRTT (Chris).
> - Use i915_dbg to log errors for the invalid TRTT ABI parameters passed
> from user space (Chris).
> - Move all the argument checking for TRTT in context_setparam to the
> set_trtt function (Chris).
> - Change the type of 'flags' field inside 'intel_context' to unsigned (Chris)
> - Rename certain functions to rightly reflect their purpose, rename
> the new param for TRTT in gem_context_param to
> I915_CONTEXT_PARAM_TRTT,
> rephrase few lines in the commit message body, add more comments
> (Chris).
> - Extend ABI to allow User specify TRTT segment location also.
> - Fix for selective enabling of TRTT on per context basis, explicitly
> disable TR-TT at the start of a new context.
>
> v3:
> - Check the return value of gen9_emit_trtt_regs (Chris)
> - Update the kernel doc for intel_context structure.
> - Rebased.
>
> v4:
> - Fix the warnings reported by 'checkpatch.pl --strict' (Michel)
> - Fix the context_getparam implementation avoiding the reset of size field,
> affecting the TRTT case.
>
> v5:
> - Update the TR-TT params right away in context_setparam, by constructing
> & submitting a request emitting LRIs, instead of deferring it and
> conflating with the next batch submission (Chris)
> - Follow the struct_mutex handling related prescribed rules, while accessing
> User space buffer, both in context_setparam & getparam functions (Chris).
>
> v6:
> - Fix the warning caused due to removal of un-allocated trtt vma node.
>
> v7:
> - Move context ref/unref to context_setparam_ioctl from set_trtt() &
> remove
> that from get_trtt() as not really needed there (Chris).
> - Add a check for improper values for Null & Invalid Tiles.
> - Remove superfluous DRM_ERROR from trtt_context_allocate_vma (Chris).
> - Rebased.
>
> v8:
> - Add context ref/unref to context_getparam_ioctl also so as to be
> consistent
> and ease the extension of ioctl in future (Chris)
>
> v9:
> - Fix the handling of return value from trtt_context_allocate_vma() function,
> causing kernel panic at the time of destroying context, in case of
> unsuccessful allocation of trtt vma.
> - Rebased.
>
> Testcase: igt/gem_trtt
>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 16 +++-
> drivers/gpu/drm/i915/i915_gem_context.c | 157
> +++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 65 +++++++++++++
> drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++
> drivers/gpu/drm/i915/i915_reg.h | 19 ++++
> drivers/gpu/drm/i915/intel_lrc.c | 124 ++++++++++++++++++++++++-
> drivers/gpu/drm/i915/intel_lrc.h | 1 +
> include/uapi/drm/i915_drm.h | 8 ++
> 8 files changed, 393 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -804,6 +804,7 @@ struct i915_ctx_hang_stats { #define
> DEFAULT_CONTEXT_HANDLE 0
>
> #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT (1 << 1)
> /**
> * struct intel_context - as the name implies, represents a context.
> * @ref: reference count.
> @@ -818,6 +819,8 @@ struct i915_ctx_hang_stats {
> * @ppgtt: virtual memory space used by this context.
> * @legacy_hw_ctx: render context backing object and whether it is
> correctly
> * initialized (legacy ring submission mechanism only).
> + * @trtt_info: Programming parameters for tr-tt (redirection tables for
> + * userspace, for sparse resource management)
> * @link: link in the global list of contexts.
> *
> * Contexts are memory images used by the hardware to store copies of
> their @@ -828,7 +831,7 @@ struct intel_context {
> int user_handle;
> uint8_t remap_slice;
> struct drm_i915_private *i915;
> - int flags;
> + unsigned int flags;
> struct drm_i915_file_private *file_priv;
> struct i915_ctx_hang_stats hang_stats;
> struct i915_hw_ppgtt *ppgtt;
> @@ -849,6 +852,15 @@ struct intel_context {
> uint32_t *lrc_reg_state;
> } engine[I915_NUM_ENGINES];
>
> + /* TRTT info */
> + struct intel_context_trtt {
> + u32 invd_tile_val;
> + u32 null_tile_val;
> + u64 l3_table_address;
> + u64 segment_base_addr;
> + struct i915_vma *vma;
> + } trtt_info;
> +
> struct list_head link;
> };
>
> @@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table {
> !IS_VALLEYVIEW(dev) &&
> !IS_CHERRYVIEW(dev) && \
> !IS_BROXTON(dev))
>
> +#define HAS_TRTT(dev) (IS_GEN9(dev))
> +
A very minor point, but there is a w/a to disable TRTT for BXT_REVID_A0/1. I realise this
is basically obsolete now, but I'm still using one!
> #define INTEL_PCH_DEVICE_ID_MASK 0xff00
> #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00
> #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 394e525..5f28c23 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -133,6 +133,14 @@ static int get_context_size(struct drm_device *dev)
> return ret;
> }
>
> +static void intel_context_free_trtt(struct intel_context *ctx) {
> + if (!ctx->trtt_info.vma)
> + return;
> +
> + intel_trtt_context_destroy_vma(ctx->trtt_info.vma);
> +}
> +
> static void i915_gem_context_clean(struct intel_context *ctx) {
> struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; @@ -164,6 +172,8 @@
> void i915_gem_context_free(struct kref *ctx_ref)
> */
> i915_gem_context_clean(ctx);
>
> + intel_context_free_trtt(ctx);
> +
> i915_ppgtt_put(ctx->ppgtt);
>
> if (ctx->legacy_hw_ctx.rcs_state)
> @@ -507,6 +517,129 @@ i915_gem_context_get(struct
> drm_i915_file_private *file_priv, u32 id)
> return ctx;
> }
>
> +static int
> +intel_context_get_trtt(struct intel_context *ctx,
> + struct drm_i915_gem_context_param *args) {
> + struct drm_i915_gem_context_trtt_param trtt_params;
> + struct drm_device *dev = ctx->i915->dev;
> +
> + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) {
> + return -ENODEV;
> + } else if (args->size < sizeof(trtt_params)) {
> + args->size = sizeof(trtt_params);
> + } else {
> + trtt_params.segment_base_addr =
> + ctx->trtt_info.segment_base_addr;
> + trtt_params.l3_table_address =
> + ctx->trtt_info.l3_table_address;
> + trtt_params.null_tile_val =
> + ctx->trtt_info.null_tile_val;
> + trtt_params.invd_tile_val =
> + ctx->trtt_info.invd_tile_val;
> +
> + mutex_unlock(&dev->struct_mutex);
> +
> + if (__copy_to_user(to_user_ptr(args->value),
> + &trtt_params,
> + sizeof(trtt_params))) {
> + mutex_lock(&dev->struct_mutex);
> + return -EFAULT;
> + }
> +
> + args->size = sizeof(trtt_params);
> + mutex_lock(&dev->struct_mutex);
> + }
> +
> + return 0;
> +}
> +
> +static int
> +intel_context_set_trtt(struct intel_context *ctx,
> + struct drm_i915_gem_context_param *args) {
> + struct drm_i915_gem_context_trtt_param trtt_params;
> + struct i915_vma *vma;
> + struct drm_device *dev = ctx->i915->dev;
> + int ret;
> +
> + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev))
> + return -ENODEV;
> + else if (ctx->flags & CONTEXT_USE_TRTT)
> + return -EEXIST;
> + else if (args->size < sizeof(trtt_params))
> + return -EINVAL;
> +
> + mutex_unlock(&dev->struct_mutex);
> +
> + if (copy_from_user(&trtt_params,
> + to_user_ptr(args->value),
> + sizeof(trtt_params))) {
> + mutex_lock(&dev->struct_mutex);
> + ret = -EFAULT;
> + goto exit;
> + }
> +
> + mutex_lock(&dev->struct_mutex);
> +
> + /* Check if the setup happened from another path */
> + if (ctx->flags & CONTEXT_USE_TRTT) {
> + ret = -EEXIST;
> + goto exit;
> + }
> +
> + /* basic sanity checks for the segment location & l3 table pointer */
> + if (trtt_params.segment_base_addr & (GEN9_TRTT_SEGMENT_SIZE -
> 1)) {
> + i915_dbg(dev, "segment base address not correctly
> aligned\n");
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + if (((trtt_params.l3_table_address + PAGE_SIZE) >=
> + trtt_params.segment_base_addr) &&
> + (trtt_params.l3_table_address <
> + (trtt_params.segment_base_addr +
> GEN9_TRTT_SEGMENT_SIZE))) {
> + i915_dbg(dev, "l3 table address conflicts with trtt
> segment\n");
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + if (trtt_params.l3_table_address &
> ~GEN9_TRTT_L3_GFXADDR_MASK) {
> + i915_dbg(dev, "invalid l3 table address\n");
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + if (trtt_params.null_tile_val == trtt_params.invd_tile_val) {
> + i915_dbg(dev, "incorrect values for null & invalid tiles\n");
> + return -EINVAL;
> + }
> +
> + vma = intel_trtt_context_allocate_vma(&ctx->ppgtt->base,
> + trtt_params.segment_base_addr);
> + if (IS_ERR(vma)) {
> + ret = PTR_ERR(vma);
> + goto exit;
> + }
> +
> + ctx->trtt_info.vma = vma;
> + ctx->trtt_info.null_tile_val = trtt_params.null_tile_val;
> + ctx->trtt_info.invd_tile_val = trtt_params.invd_tile_val;
> + ctx->trtt_info.l3_table_address = trtt_params.l3_table_address;
> + ctx->trtt_info.segment_base_addr =
> trtt_params.segment_base_addr;
> +
> + ret = intel_lr_rcs_context_setup_trtt(ctx);
> + if (ret) {
> + intel_trtt_context_destroy_vma(ctx->trtt_info.vma);
> + goto exit;
> + }
> +
> + ctx->flags |= CONTEXT_USE_TRTT;
> +
> +exit:
> + return ret;
> +}
> +
> static inline int
> mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags) { @@ -
> 931,7 +1064,14 @@ int i915_gem_context_getparam_ioctl(struct drm_device
> *dev, void *data,
> return PTR_ERR(ctx);
> }
>
> - args->size = 0;
> + /*
> + * Take a reference also, as in certain cases we have to release &
> + * reacquire the struct_mutex and we don't want the context to
> + * go away.
> + */
> + i915_gem_context_reference(ctx);
> +
> + args->size = (args->param != I915_CONTEXT_PARAM_TRTT) ? 0 :
> +args->size;
> switch (args->param) {
> case I915_CONTEXT_PARAM_BAN_PERIOD:
> args->value = ctx->hang_stats.ban_period_seconds;
> @@ -947,10 +1087,14 @@ int i915_gem_context_getparam_ioctl(struct
> drm_device *dev, void *data,
> else
> args->value = to_i915(dev)->ggtt.base.total;
> break;
> + case I915_CONTEXT_PARAM_TRTT:
> + ret = intel_context_get_trtt(ctx, args);
> + break;
> default:
> ret = -EINVAL;
> break;
> }
> + i915_gem_context_unreference(ctx);
> mutex_unlock(&dev->struct_mutex);
>
> return ret;
> @@ -974,6 +1118,13 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
> return PTR_ERR(ctx);
> }
>
> + /*
> + * Take a reference also, as in certain cases we have to release &
> + * reacquire the struct_mutex and we don't want the context to
> + * go away.
> + */
> + i915_gem_context_reference(ctx);
> +
> switch (args->param) {
> case I915_CONTEXT_PARAM_BAN_PERIOD:
> if (args->size)
> @@ -992,10 +1143,14 @@ int i915_gem_context_setparam_ioctl(struct
> drm_device *dev, void *data,
> ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP :
> 0;
> }
> break;
> + case I915_CONTEXT_PARAM_TRTT:
> + ret = intel_context_set_trtt(ctx, args);
> + break;
> default:
> ret = -EINVAL;
> break;
> }
> + i915_gem_context_unreference(ctx);
> mutex_unlock(&dev->struct_mutex);
>
> return ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0715bb7..cbf8a03 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev) {
> gtt_write_workarounds(dev);
>
> + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + /*
> + * Globally enable TR-TT support in Hw.
> + * Still TR-TT enabling on per context basis is required.
> + * Non-trtt contexts are not affected by this setting.
> + */
> + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> + GEN9_TRTT_BYPASS_DISABLE);
> + }
> +
> /* In the case of execlists, PPGTT is enabled by the context
> descriptor
> * and the PDPs are contained within the context itself. We don't
> * need to do anything here. */
> @@ -3362,6 +3373,60 @@
> i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object
> *obj,
>
> }
>
> +void intel_trtt_context_destroy_vma(struct i915_vma *vma) {
> + struct i915_address_space *vm = vma->vm;
> +
> + WARN_ON(!list_empty(&vma->obj_link));
> + WARN_ON(!list_empty(&vma->vm_link));
> + WARN_ON(!list_empty(&vma->exec_list));
> +
> + WARN_ON(!vma->pin_count);
> +
> + if (drm_mm_node_allocated(&vma->node))
> + drm_mm_remove_node(&vma->node);
> +
> + i915_ppgtt_put(i915_vm_to_ppgtt(vm));
> + kmem_cache_free(to_i915(vm->dev)->vmas, vma); }
> +
> +struct i915_vma *
> +intel_trtt_context_allocate_vma(struct i915_address_space *vm,
> + uint64_t segment_base_addr)
> +{
> + struct i915_vma *vma;
> + int ret;
> +
> + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> + if (!vma)
> + return ERR_PTR(-ENOMEM);
> +
> + INIT_LIST_HEAD(&vma->obj_link);
> + INIT_LIST_HEAD(&vma->vm_link);
> + INIT_LIST_HEAD(&vma->exec_list);
> + vma->vm = vm;
> + i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> +
> + /* Mark the vma as permanently pinned */
> + vma->pin_count = 1;
> +
> + /* Reserve from the 48 bit PPGTT space */
> + vma->node.start = segment_base_addr;
> + vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> + ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> + if (ret) {
> + ret = i915_gem_evict_for_vma(vma);
> + if (ret == 0)
> + ret = drm_mm_reserve_node(&vm->mm, &vma-
> >node);
> + }
> + if (ret) {
> + intel_trtt_context_destroy_vma(vma);
> + return ERR_PTR(ret);
> + }
> +
> + return vma;
> +}
> +
> static struct scatterlist *
> rotate_pages(const dma_addr_t *in, unsigned int offset,
> unsigned int width, unsigned int height, diff --git
> a/drivers/gpu/drm/i915/i915_gem_gtt.h
> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d804be0..8cbaca2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t;
> #define GEN8_PPAT_ELLC_OVERRIDE (0<<2)
> #define GEN8_PPAT(i, x) ((uint64_t) (x) << ((i) * 8))
>
> +/* Fixed size segment */
> +#define GEN9_TRTT_SEG_SIZE_SHIFT 44
> +#define GEN9_TRTT_SEGMENT_SIZE (1ULL <<
> GEN9_TRTT_SEG_SIZE_SHIFT)
> +
> enum i915_ggtt_view_type {
> I915_GGTT_VIEW_NORMAL = 0,
> I915_GGTT_VIEW_ROTATED,
> @@ -560,4 +564,8 @@ size_t
> i915_ggtt_view_size(struct drm_i915_gem_object *obj,
> const struct i915_ggtt_view *view);
>
> +struct i915_vma *
> +intel_trtt_context_allocate_vma(struct i915_address_space *vm,
> + uint64_t segment_base_addr);
> +void intel_trtt_context_destroy_vma(struct i915_vma *vma);
> #endif
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
> reg)
> #define GEN8_RPCS_EU_MIN_SHIFT 0
> #define GEN8_RPCS_EU_MIN_MASK (0xf <<
> GEN8_RPCS_EU_MIN_SHIFT)
>
> +#define GEN9_TR_CHICKEN_BIT_VECTOR _MMIO(0x4DFC)
> +#define GEN9_TRTT_BYPASS_DISABLE (1 << 0)
> +
> +/* TRTT registers in the H/W Context */
> +#define GEN9_TRTT_L3_POINTER_DW0 _MMIO(0x4DE0)
> +#define GEN9_TRTT_L3_POINTER_DW1 _MMIO(0x4DE4)
> +#define GEN9_TRTT_L3_GFXADDR_MASK 0xFFFFFFFF0000
> +
> +#define GEN9_TRTT_NULL_TILE_REG _MMIO(0x4DE8)
> +#define GEN9_TRTT_INVD_TILE_REG _MMIO(0x4DEC)
> +
> +#define GEN9_TRTT_VA_MASKDATA _MMIO(0x4DF0)
> +#define GEN9_TRVA_MASK_VALUE 0xF0
> +#define GEN9_TRVA_DATA_MASK 0xF
> +
> +#define GEN9_TRTT_TABLE_CONTROL _MMIO(0x4DF4)
> +#define GEN9_TRTT_IN_GFX_VA_SPACE (1 << 1)
> +#define GEN9_TRTT_ENABLE (1 << 0)
> +
> #define GAM_ECOCHK _MMIO(0x4090)
> #define BDW_DISABLE_HDC_INVALIDATION (1<<25)
> #define ECOCHK_SNB_BIT (1<<10)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 3a23b95..8af480b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct
> intel_engine_cs *engine)
> return init_workarounds_ring(engine);
> }
>
> +static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req)
> +{
> + struct intel_ringbuffer *ringbuf = req->ringbuf;
> + int ret;
> +
> + ret = intel_logical_ring_begin(req, 2 + 2);
> + if (ret)
> + return ret;
> +
> + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
> +
> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
> + intel_logical_ring_emit(ringbuf, 0);
> +
> + intel_logical_ring_emit(ringbuf, MI_NOOP);
> + intel_logical_ring_advance(ringbuf);
> +
> + return 0;
> +}
> +
> +static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) {
> + struct intel_context *ctx = req->ctx;
> + struct intel_ringbuffer *ringbuf = req->ringbuf;
> + u64 masked_l3_gfx_address =
> + ctx->trtt_info.l3_table_address &
> GEN9_TRTT_L3_GFXADDR_MASK;
> + u32 trva_data_value =
> + (ctx->trtt_info.segment_base_addr >>
> GEN9_TRTT_SEG_SIZE_SHIFT) &
> + GEN9_TRVA_DATA_MASK;
> + const int num_lri_cmds = 6;
> + int ret;
> +
> + /*
> + * Emitting LRIs to update the TRTT registers is most reliable, instead
> + * of directly updating the context image, as this will ensure that
> + * update happens in a serialized manner for the context and also
> + * lite-restore scenario will get handled.
> + */
> + ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
> + if (ret)
> + return ret;
> +
> + intel_logical_ring_emit(ringbuf,
> MI_LOAD_REGISTER_IMM(num_lri_cmds));
> +
> + intel_logical_ring_emit_reg(ringbuf,
> GEN9_TRTT_L3_POINTER_DW0);
> + intel_logical_ring_emit(ringbuf,
> +lower_32_bits(masked_l3_gfx_address));
> +
> + intel_logical_ring_emit_reg(ringbuf,
> GEN9_TRTT_L3_POINTER_DW1);
> + intel_logical_ring_emit(ringbuf,
> +upper_32_bits(masked_l3_gfx_address));
> +
> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG);
> + intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val);
> +
> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG);
> + intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val);
> +
> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA);
> + intel_logical_ring_emit(ringbuf,
> + GEN9_TRVA_MASK_VALUE |
> trva_data_value);
> +
> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
> + intel_logical_ring_emit(ringbuf,
> + GEN9_TRTT_IN_GFX_VA_SPACE |
> GEN9_TRTT_ENABLE);
> +
> + intel_logical_ring_emit(ringbuf, MI_NOOP);
> + intel_logical_ring_advance(ringbuf);
> +
> + return 0;
> +}
> +
> static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
> {
> struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6
> +2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request
> *req)
> return intel_lr_context_render_state_init(req);
> }
>
> +static int gen9_init_rcs_context(struct drm_i915_gem_request *req) {
> + int ret;
> +
> + /*
> + * Explictily disable TR-TT at the start of a new context.
> + * Otherwise on switching from a TR-TT context to a new Non TR-TT
> + * context the TR-TT settings of the outgoing context could get
> + * spilled on to the new incoming context as only the Ring Context
> + * part is loaded on the first submission of a new context, due to
> + * the setting of ENGINE_CTX_RESTORE_INHIBIT bit.
> + */
> + ret = gen9_init_rcs_context_trtt(req);
> + if (ret)
> + return ret;
> +
> + return gen8_init_rcs_context(req);
> +}
> +
> /**
> * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
> *
> @@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct
> drm_device *dev)
> logical_ring_default_vfuncs(dev, engine);
>
> /* Override some for render ring. */
> - if (INTEL_INFO(dev)->gen >= 9)
> + if (INTEL_INFO(dev)->gen >= 9) {
> engine->init_hw = gen9_init_render_ring;
> - else
> + engine->init_context = gen9_init_rcs_context;
> + } else {
> engine->init_hw = gen8_init_render_ring;
> - engine->init_context = gen8_init_rcs_context;
> + engine->init_context = gen8_init_rcs_context;
> + }
> +
> engine->cleanup = intel_fini_pipe_control;
> engine->emit_flush = gen8_emit_flush_render;
> engine->emit_request = gen8_emit_request_render; @@ -2702,3
> +2794,29 @@ void intel_lr_context_reset(struct drm_device *dev,
> ringbuf->tail = 0;
> }
> }
> +
> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) {
> + struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]);
> + struct drm_i915_gem_request *req;
> + int ret;
> +
> + if (!ctx->engine[RCS].state) {
> + ret = intel_lr_context_deferred_alloc(ctx, engine);
> + if (ret)
> + return ret;
> + }
> +
> + req = i915_gem_request_alloc(engine, ctx);
> + if (IS_ERR(req))
> + return PTR_ERR(req);
> +
> + ret = gen9_emit_trtt_regs(req);
> + if (ret) {
> + i915_gem_request_cancel(req);
> + return ret;
> + }
> +
> + i915_add_request(req);
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> b/drivers/gpu/drm/i915/intel_lrc.h
> index a17cb12..f3600b2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev,
> struct intel_context *ctx);
> uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
> struct intel_engine_cs *engine);
> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx);
>
> u32 intel_execlists_ctx_id(struct intel_context *ctx,
> struct intel_engine_cs *engine);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a5524cc..604da23 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param {
> #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1
> #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2
> #define I915_CONTEXT_PARAM_GTT_SIZE 0x3
> +#define I915_CONTEXT_PARAM_TRTT 0x4
> __u64 value;
> };
>
> +struct drm_i915_gem_context_trtt_param {
> + __u64 segment_base_addr;
> + __u64 l3_table_address;
> + __u32 invd_tile_val;
> + __u32 null_tile_val;
> +};
> +
> #endif /* _UAPI_I915_DRM_H_ */
> --
> 1.9.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list