[Intel-gfx] [PATCH] drm/i915: Support to enable TRTT on GEN9

Goel, Akash akash.goel at intel.com
Mon Jan 11 04:29:53 PST 2016

On 1/11/2016 2:19 PM, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 01:09:50PM +0530, Goel, Akash wrote:
>> On 1/10/2016 11:09 PM, Chris Wilson wrote:
>>> On Sat, Jan 09, 2016 at 05:00:21PM +0530, akash.goel at intel.com wrote:
>>>> From: Akash Goel <akash.goel at intel.com>
>>>> Gen9 has an additional address translation hardware support in form of
>>>> Tiled Resource Translation Table (TR-TT) which provides an extra level
>>>> of abstraction over PPGTT.
>>>> This is useful for mapping Sparse/Tiled texture resources.
>>>> Sparse resources are created as virtual-only allocations. Regions of the
>>>> resource that the application intends to use is bound to the physical memory
>>>> on the fly and can be re-bound to different memory allocations over the
>>>> lifetime of the resource.
>>>> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
>>>> for a new PPGTT instance, but TR-TT may not enabled for every context.
>>>> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
>>>> which such chunk to use is conveyed to HW through a register.
>>>> Any GFX address, which lies in that reserved 44 bit range will be translated
>>>> through TR-TT first and then through PPGTT to get the actual physical address,
>>>> so the output of translation from TR-TT will be a PPGTT offset.
>>>> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which
>>>> leaves behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and
>>>> each level is contained within a 4KB page hence L3 and L2 is composed of
>>>> 512 64b entries and L1 is composed of 1024 32b entries.
>>>> There is a provision to keep TR-TT Tables in virtual space, where the pages of
>>>> TRTT tables will be mapped to PPGTT.
>>>> Currently this is the supported mode, in this mode UMD will have a full control
>>>> on TR-TT management, with bare minimum support from KMD.
>>>> So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
>>>> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
>>>> The entries of L1 table will contain the PPGTT offset of BOs actually backing
>>>> the Sparse resources.
>>>> The assumption here is that UMD only will do the complete PPGTT address space
>>>> management and use the Soft Pin API for all the buffer objects associated with
>>>> a given Context.
>>> That is a poor assumption, and not one required for this to work.
>> This is not a strict requirement.
>> But I thought that conflicts will be minimized if UMD itself can do
>> the full address space management.
>> At least UMD has to ensure that PPGTT offset of L3 table remains
>> same throughout.
> Yes, userspace must control that object, and that would require softpin
> to preserve it across execbuffer calls. The kernel does not require that
> all addresses be handled in userspace afterwards, that's the language I
> wish to avoid. (Hence I don't like using "assumption" as that just
> invites userspace to break the kernel.)
Fine will remove the word "assumption", instead can I put it as "UMD may 
do the complete PPGTT address space management, on the pretext that it 
could help to might minimize the conflicts".

>>>> So UMD will also have to allocate the L3/L2/L1 table pages
>>>> as a regular GEM BO only & assign them a PPGTT address through the Soft Pin API.
>>>> UMD would have to emit the MI_STORE_DATA_IMM commands in the batch buffer to
>>>> program the relevant entries of L3/L2/L1 tables.
>>> This only applies to te TR-TT L1-L3 cache, right?
>> Yes applies only to the TR-TT L1-L3 tables.
>> The backing pages of L3/L2/L1 tables shall be allocated as a BO,
>> which should be assigned a PPGTT address.
>> The table entries could be written directly also by UMD by mmapping
>> the table BOs, but adding MI_STORE_DATA_IMM commands in the batch
>> buffer itself would help to achieve serialization (implicitly).
> Can you tighten up the phrasing here? My first read was that you indeed
> for all PTE writes to be in userspace, which is scary.
> "UMD will then allocate the L3/L32/L1 page tables for TR-TT as a regular
> bo, and will use softpin to assign it to the l3_table_address when used.
> UMD will also maintain the entries in the TR-TT page tables using
> regular batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
> table bo."

Yes, UMD will have to use softpin to assign l3_table_address to L3 table BO.
Similarly the softpin will be needed for L2/L1 table BOs.

>>>> autonomously and KMD will be oblivious of it.
>>>> The BOs must not be assigned an address from TR-TT segment, they will be mapped
>>> s/The BOs/Any object/
>> Ok will use 'Any object'
>>>> to PPGTT in a regular way by KMD
>>> s/using the Soft Pin offset provided by UMD// as this is irrelevant.
>> You mean to say that it is needless or inappropriate to state that
>> KMD will use the Soft PIN offset provided by UMD, it doesn't matter
>> that whether the Soft PIN offset is used or KMD itself assigns an
>> address.
> I just want to avoid implying that userspace must use softpin on every
> single bo for this to work. (Mainly because I don't really want
> userspace to have to do full address space management, as we will always
> have to do the double check inside the kernel. Unless there is a real
> need (e.g. svm), I'd rather improve the kernel allocator/verification, rather
> than try and circumvent it.)
>>>> @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>>>>   	case I915_PARAM_HAS_EXEC_SOFTPIN:
>>>>   		value = 1;
>>>>   		break;
>>>> +	case I915_PARAM_HAS_TRTT:
>>>> +		value = HAS_TRTT(dev);
>>>> +		break;
>>> Should we do this here, or just query the context? In fact you are
>>> missing the context getparam path any way.
>> Sorry, do you mean to say that with -ENODEV error also, on context
>> setparam, User can make out the TR-TT support, so no need to have an
>> explicit getparam case.
>> Would the context getparam path be really useful for TR-TT?.
>> If its needed, then would be better to rename
> The question I have is do we want:
> or
> the latter seems more symmetric and flexible, and we can use as a double
> check later on that we set the right address etc.
> Indeed, hindsight says ENABLE_TRTT is a bad name :)
> I915_CONTEXT_PARAM_TRTT (let's assume for now  this will be the last,
> any future PARAM_TRTT can think of a good name for its extension).

fine, will rename as I915_CONTEXT_PARAM_TRTT.

Would the following change be fine ?

@@ -974,6 +987,24 @@ int i915_gem_context_getparam_ioctl(struct 
drm_device *dev, void *data,
  			args->value = to_i915(dev)->gtt.base.total;
+		if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev))
+			return -ENODEV;
+		else if (args->size < sizeof(trtt_params))
+			ret = -EINVAL;
+		else {
+			trtt_params.l3_table_address =
+				ctx->trtt_info.l3_table_address;
+			trtt_params.null_tile_val =
+				ctx->trtt_info.null_tile_val;
+			trtt_params.invd_tile_val =
+				ctx->trtt_info.invd_tile_val;
+			if (__copy_to_user(to_user_ptr(args->value),
+					   &trtt_params,
+					   sizeof(trtt_params)))
+				ret = -EFAULT;
+		}

>>>>   	default:
>>>>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>>>>   		return -EINVAL;
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index c6dd4db..12c612e 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -839,6 +839,7 @@ struct i915_ctx_hang_stats {
>>>>   #define CONTEXT_NO_ZEROMAP (1<<0)
>>>> +#define CONTEXT_USE_TRTT   (1<<1)
>>> Make flags unsigned whilst you are here, and fix the holes!
>> Ok will change the type of 'flags' field inside 'intel_context' to unsigned.
>> Sorry, but apart from this anything else required here ?
> No, it's just belated anger about the silly "int flags".

Fine will modify the type in this patch only.

>>>>   /**
>>>>    * struct intel_context - as the name implies, represents a context.
>>>>    * @ref: reference count.
>>>> @@ -881,6 +882,15 @@ struct intel_context {
>>>>   		int pin_count;
>>>>   	} engine[I915_NUM_RINGS];
>>>> +	/* TRTT info */
>>>> +	struct {
>>> Give this a name now, we will be thankful in the future.
>> Would ctx_trtt_params be fine ?
> struct intel_context_trtt
> (Avoid using params for internals, let's keep those for uAPI - that
> helps us distinguish pieces of code / context.)
Thanks, intel_context_trtt is more appropriate.

>>>>   void i915_gem_context_free(struct kref *ctx_ref)
>>>> @@ -512,6 +515,35 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>>>>   	return ctx;
>>>>   }
>>>> +static int
>>>> +i915_setup_trtt_ctx(struct intel_context *ctx,
>>>> +		    struct drm_i915_gem_context_trtt_param *trtt_params)
>>>> +{
>>>> +	if (ctx->flags & CONTEXT_USE_TRTT)
>>>> +		return -EEXIST;
>>>> +
>>>> +	/* basic sanity checks for the l3 table pointer */
>>>> +	if ((ctx->trtt_info.l3_table_address >= GEN9_TRTT_SEGMENT_START) &&
>>>> +	    (ctx->trtt_info.l3_table_address <
>>> Presumably l3_table has an actual size and you want to do a range
>>> overlap test, not just the start address.
>> Yes intend to do a range overlap test only. But since L3 table size
>> is fixed as 4KB, thought there is no real need to also include the
>> size in the range check, considering the allocations are always in
>> multiple of 4KB.
> Ok. You have a choice of writting that up as a comment, or just doing
> the page overlap test :) Honestly, I would just go for the range test
> since this is a one-off init path and the reader then doesn't even have
> to think.

Fine, will do the page overlap test.

>>>> +		return -EINVAL;
>>>> +
>>>> +	if (ctx->trtt_info.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK)
>>>> +		return -EINVAL;
>>> These are worth adding DRM_DEBUG() or even better start using dev_debug()
>>> so that we can debug userspace startup issues.
>> Fine, I think DRM_DEBUG_DRIVER will be more appropriate compared to
> No, these are userspace errors for which we use DRM_DEBUG.
> DRM_DEBUG_DRIVER is for a driver error :)
>> Or
>> 	dev_dbg(dev->dev, "invalid l3 table address\n");
> Include as much info as you can (without giving away kernel internals),
> since the user gave use the l3_address that we reject, report it. That
> makes it easier to spot if it is the same address as they expected.
> dev_dbg() would be my preference.
> #define i915_dbg(DEV, args...) dev_dbg(__I915__(DEV)->dev->dev, ##args)
> (not the prettiest yet, the pointer dancing is in the wrong direction!)
> and let's get the ball rolling.

This will also go as a separate patch.
One doubt here, by using dev_dbg() we intend to avoid dependency on the 
value of drm.debug parameter and always get certain error messages ?. 
Sorry just want to understand the rationale behind it.

>>>>   int i915_ppgtt_init_hw(struct drm_device *dev)
>>>>   {
>>>> +	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
>>>> +		struct drm_i915_private *dev_priv = dev->dev_private;
>>>> +
>>> Shouldn't this be a context specific register? In which case you need to
>>> set it in the context image instead.
>>> Hmm. given you already do the context image tweaks, how does work with
>>> non-trtt contexts?
>> GEN9_TR_CHICKEN_BIT_VECTOR is not a context specific register.
>> It globally enables TR-TT support in Hw. Still TR-TT enabling on per
>> context basis is required.
>> Non-trtt contexts are not affected by this setting.
> Please add that as a comment here. What are the downsides, potential
> regressions? It's behind a chicken bit after all...
Fine, will add a comment here for clarity.
So not aware of downsides, this setting should come into picture only 
when TR-TT is enabled for a context.

>> Ok so need to define a new wrapper function,
>> 	i915_vm_reserve_node(vm, START, SIZE, &vma).
>> After looking at the other callsites of drm_mm_reserve_node,
>> including i915_vgpu, I think it would be better to have the
>> prototype as,
>>     i915_vm_reserve_node(vm, &node);
>> However this should be done as a separate patch ?
> Yes, I was just recognising the code duplication and found 3 places
> where we could use it - 3 being the magic number to refactor.
>>>> +struct drm_i915_gem_context_trtt_param {
>>>> +	__u64 l3_table_address;
>>>> +	__u32 invd_tile_val;
>>>> +	__u32 null_tile_val;
>>>> +};
>>> Passes the ABI structure sanity checks.
>> Should we allow User to also choose the location of TR-TT segment
>> (size is anyways fixed as 1<<44).
> The kernel is much more agnostic with your approach than I anticipated,
> so from our pov, allowing the user to shoot themselves in the foot is
> ok.
> There is only one sensible location, but that one location may be
> sensible for a few things.
> i.e. it shouldn't be below 1<<40 so that you can do full aliasing
> between CPU and GPU addresses, and you want to avoid cutting your
> address space in two, so it has to go at the ends, ergo it should be at
> the very top.

So Top most region is the most suitable location, hence should be used 

Best regards

> -Chris

More information about the Intel-gfx mailing list