[Intel-gfx] [PATCH] drm/i915: Support to enable TRTT on GEN9
Chris Wilson
chris at chris-wilson.co.uk
Sun Jan 10 09:39:13 PST 2016
On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> Gen9 has an additional address translation hardware support in form of
> Tiled Resource Translation Table (TR-TT) which provides an extra level
> of abstraction over PPGTT.
> This is useful for mapping Sparse/Tiled texture resources.
> Sparse resources are created as virtual-only allocations. Regions of the
> resource that the application intends to use is bound to the physical memory
> on the fly and can be re-bound to different memory allocations over the
> lifetime of the resource.
>
> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
> for a new PPGTT instance, but TR-TT may not enabled for every context.
> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
> which such chunk to use is conveyed to HW through a register.
> Any GFX address, which lies in that reserved 44 bit range will be translated
> through TR-TT first and then through PPGTT to get the actual physical address,
> so the output of translation from TR-TT will be a PPGTT offset.
>
> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
> each level is contained within a 4KB page hence L3 and L2 is composed of
> 512 64b entries and L1 is composed of 1024 32b entries.
>
> There is a provision to keep TR-TT Tables in virtual space, where the pages of
> TRTT tables will be mapped to PPGTT.
> Currently this is the supported mode, in this mode UMD will have a full control
> on TR-TT management, with bare minimum support from KMD.
> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
> The entries of L1 table will contain the PPGTT offset of BOs actually backing
> the Sparse resources.
> The assumption here is that UMD only will do the complete PPGTT address space
> management and use the Soft Pin API for all the buffer objects associated with
> a given Context.
That is a poor assumption, and not one required for this to work.
> So UMD will also have to allocate the L3/L2/L1 table pages
> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
> program the relevant entries of L3/L2/L1 tables.
This only applies to te TR-TT L1-L3 cache, right?
> Any space in TR-TT segment not bound to any Sparse texture, will be handled
> through Invalid tile, User is expected to initialize the entries of a new
> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
> the holes in the Sparse texture resource will be set with the Null tile pattern
> The improper programming of TRTT should only lead to a recoverable GPU hang,
> eventually leading to banning of the culprit context without victimizing others.
>
> The association of any Sparse resource with the BOs will be known only to UMD,
> and only the Sparse resources shall be assigned an offset from the TR-TT segment
> by UMD. The use of TR-TT segment or mapping of Sparse resources will be
> abstracted from the KMD,
s/abstracted from/transparent to/ s/,/;/
> UMD can do the address assignment from TR-TT segment
s/can/will/
> autonomously and KMD will be oblivious of it.
> The BOs must not be assigned an address from TR-TT segment, they will be mapped
s/The BOs/Any object/
> to PPGTT in a regular way by KMD
s/using the Soft Pin offset provided by UMD// as this is irrelevant.
> This patch provides an interface through which UMD can convey KMD to enable
> TR-TT for a given context. A new I915_CONTEXT_PARAM_ENABLE_TRTT param has been
> added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
> UMD will have to pass the GFX address of L3 table page,
+along with the
> pattern value for the
> Null & invalid Tile registers.
>
> Testcase: igt/gem_trtt
>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
> drivers/gpu/drm/i915/i915_dma.c | 3 ++
> drivers/gpu/drm/i915/i915_drv.h | 12 +++++++
> drivers/gpu/drm/i915/i915_gem_context.c | 45 ++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_gem_gtt.c | 57 +++++++++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_gem_gtt.h | 6 ++++
> drivers/gpu/drm/i915/i915_reg.h | 19 +++++++++++
> drivers/gpu/drm/i915/intel_lrc.c | 41 ++++++++++++++++++++++++
> include/uapi/drm/i915_drm.h | 8 +++++
> 8 files changed, 191 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 988a380..c247c25 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> case I915_PARAM_HAS_EXEC_SOFTPIN:
> value = 1;
> break;
> + case I915_PARAM_HAS_TRTT:
> + value = HAS_TRTT(dev);
> + break;
Should we do this here, or just query the context? In fact you are
missing the context getparam path any way.
> default:
> DRM_DEBUG("Unknown parameter %d\n", param->param);
> return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c6dd4db..12c612e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
> #define DEFAULT_CONTEXT_HANDLE 0
>
> #define CONTEXT_NO_ZEROMAP (1<<0)
> +#define CONTEXT_USE_TRTT (1<<1)
Make flags unsigned whilst you are here, and fix the holes!
> /**
> * struct intel_context - as the name implies, represents a context.
> * @ref: reference count.
> @@ -881,6 +882,15 @@ struct intel_context {
> int pin_count;
> } engine[I915_NUM_RINGS];
>
> + /* TRTT info */
> + struct {
Give this a name now, we will be thankful in the future.
> + uint32_t invd_tile_val;
> + uint32_t null_tile_val;
> + uint64_t l3_table_address;
> + struct i915_vma *vma;
> + bool update_trtt_params;
> + } trtt_info;
> +
> struct list_head link;
> };
>
> @@ -2626,6 +2636,8 @@ struct drm_i915_cmd_table {
> !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
> !IS_BROXTON(dev))
>
> +#define HAS_TRTT(dev) (IS_GEN9(dev))
> +
> #define INTEL_PCH_DEVICE_ID_MASK 0xff00
> #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00
> #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 900ffd0..ae9fc34 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -146,6 +146,9 @@ static void i915_gem_context_clean(struct intel_context *ctx)
> if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> break;
> }
> +
> + if (ctx->flags & CONTEXT_USE_TRTT)
> + i915_gem_destroy_trtt_vma(ctx->trtt_info.vma);
Sould be in context free.
> }
>
> void i915_gem_context_free(struct kref *ctx_ref)
> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
> return ctx;
> }
>
> +static int
> +i915_setup_trtt_ctx(struct intel_context *ctx,
> + struct drm_i915_gem_context_trtt_param *trtt_params)
> +{
> + if (ctx->flags & CONTEXT_USE_TRTT)
> + return -EEXIST;
> +
> + /* basic sanity checks for the l3 table pointer */
> + if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
> + (ctx->trtt_info.l3_table_address <
> + (GEN9_TRTT_SEGMENT_START + GEN9_TRTT_SEGMENT_SIZE)))
Presumably l3_table has an actual size and you want to do a range
overlap test, not just the start address.
> + return -EINVAL;
> +
> + if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
> + return -EINVAL;
These are worth adding DRM_DEBUG() or even better start using dev_debug()
so that we can debug userspace startup issues.
> @@ -952,6 +984,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> {
> struct drm_i915_file_private *file_priv = file->driver_priv;
> struct drm_i915_gem_context_param *args = data;
> + struct drm_i915_gem_context_trtt_param trtt_params;
> struct intel_context *ctx;
> int ret;
>
> @@ -983,6 +1016,18 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
> }
> break;
> + case I915_CONTEXT_PARAM_ENABLE_TRTT:
Bump this case to i915_setup_trtt_ctx.
i.e. just have
ret = i915_setup_trtt_ctx(ctx, args);
break;
Otherwise this function will become very unwieldly very quickly.
> int i915_ppgtt_init_hw(struct drm_device *dev)
> {
> + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
> + struct drm_i915_private *dev_priv = dev->dev_private;
> +
> + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
> + GEN9_TRTT_BYPASS_DISABLE);
Shouldn't this be a context specific register? In which case you need to
set it in the context image instead.
Hmm. given you already do the context image tweaks, how does work with
non-trtt contexts?
> +struct i915_vma *
> +i915_gem_setup_trtt_vma(struct i915_address_space *vm)
> +{
> + struct i915_vma *vma;
> + int ret;
> +
> + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
> + if (vma == NULL)
> + return ERR_PTR(-ENOMEM);
> +
> + INIT_LIST_HEAD(&vma->vma_link);
> + INIT_LIST_HEAD(&vma->mm_list);
> + INIT_LIST_HEAD(&vma->exec_list);
> + vma->vm = vm;
> + i915_ppgtt_get(i915_vm_to_ppgtt(vm));
Tempted to write a patch to allow
vma->vm = i915_ppggtt_get(i915_vm_to_ppgtt(vm));
?
> + /* Mark the vma as perennially pinned */
s/perennially/permanently/
We don't want to lose the reservation as opposed to having it grow back
next year.
> + vma->pin_count = 1;
> +
> + /* Reserve from the 48 bit PPGTT space */
> + vma->node.start = GEN9_TRTT_SEGMENT_START;
> + vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
> + ret = drm_mm_reserve_node(&vm->mm, &vma->node);
> + if (ret) {
> + ret = i915_gem_evict_for_vma(vma);
> + if (ret == 0)
> + ret = drm_mm_reserve_node(&vm->mm, &vma->node);
Good. I think we want i915_vm_reserve_node(vm, START, SIZE, &vma) - but
have a look at the other callsites to see if we have a common interface.
Looks like this would improve i915_vgpu.
> +struct drm_i915_gem_context_trtt_param {
> + __u64 l3_table_address;
> + __u32 invd_tile_val;
> + __u32 null_tile_val;
> +};
Passes the ABI structure sanity checks.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list