[Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables

Thomas Hellström thomas.hellstrom at linux.intel.com
Wed Oct 13 14:55:53 UTC 2021


On 10/13/21 16:41, Daniel Vetter wrote:
> On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote:
>> As we start to introduce asynchronous failsafe object migration,
>> where we update the object state and then submit asynchronous
>> commands we need to record what memory resources are actually used
>> by various part of the command stream. Initially for three purposes:
>>
>> 1) Error capture.
>> 2) Asynchronous migration error recovery.
>> 3) Asynchronous vma bind.
>>
>> At the time where these happens, the object state may have been updated
>> to be several migrations ahead and object sg-tables discarded.
>>
>> In order to make it possible to keep sg-tables with memory resource
>> information for these operations, introduce refcounted sg-tables that
>> aren't freed until the last user is done with them.
>>
>> The alternative would be to reference information sitting on the
>> corresponding ttm_resources which typically have the same lifetime as
>> these refcountes sg_tables, but that leads to other awkward constructs:
>> Due to the design direction chosen for ttm resource managers that would
>> lead to diamond-style inheritance, the LMEM resources may sometimes be
>> prematurely freed, and finally the subclassed struct ttm_resource would
>> have to bleed into the asynchronous vma bind code.
> On the diamon inheritence I was pondering some more whether we shouldn't
> just do the classic C union horrors, i.e.
>
> struct ttm_resource {
> 	/* stuff */
> };
>
> struct ttm_drm_mm_resource {
> 	struct ttm_resource base;
> 	struct drm_mm_node;
> };
>
> struct ttm_buddy_resource {
> 	struct ttm_resource base;
> 	struct drm_buddy_node;
> };
>
> Whatever else we have, maybe also integer resources for guc_id.
>
> And then the horrors:
>
> struct i915_gem_resource {
> 	union {
> 		struct ttm_resource base;
> 		struct ttm_drm_mm_resource drm_mm;
> 		struct ttm_buffer_object buddy;
> 	};
>
> 	/* i915 stuff */
> };
>
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, drmm_mm.base))
> BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) ==
> 	offsetof(struct i915_gem_resource, buddy.base))
>
> This is horrible, but also in official C89 and later unions are the only
> ways to do inheritance. The only reason we can do different in linux is
> because we compile with strict aliasing turned off.
>
> So I think we can shrug this off as officially sanctioned horrors. There's
> a small downside with overhead maybe, but I don't think the amount in
> difference between the various allocators is big enough that we should
> care. Plus a pointer to driver stuff to resolve the diamond inheritance
> through different means isn't free either.

Yes, this is exactly what was meant by "awkward constructs" in the 
commit message,

My thoughts are still that all this could be avoided by a different 
design for struct ttm_resource,
but I agree we can do with refcounted sg-lists for now, to see where 
this ends up when all related resource-on-lru stuff lands in TTM.

/Thomas




More information about the Intel-gfx mailing list