[Intel-gfx] [RFC PATCH] drm/ttm: Add a private member to the struct ttm_resource

Tue Sep 14 08:53:26 UTC 2021

Am 14.09.21 um 10:27 schrieb Thomas Hellström:
> On Tue, 2021-09-14 at 09:40 +0200, Christian König wrote:
>> Am 13.09.21 um 14:41 schrieb Thomas Hellström:
>>> [SNIP]
>>>>>> Let's say you have a struct ttm_object_vram and a struct
>>>>>> ttm_object_gtt, both subclassing drm_gem_object. Then I'd say
>>>>>> a
>>>>>> driver would want to subclass those to attach identical data,
>>>>>> extend functionality and provide a single i915_gem_object to
>>>>>> the
>>>>>> rest of the driver, which couldn't care less whether it's
>>>>>> vram or
>>>>>> gtt? Wouldn't you say having separate struct ttm_object_vram
>>>>>> and a
>>>>>> struct ttm_object_gtt in this case would be awkward?. We
>>>>>> *want* to
>>>>>> allow common handling.
>>>>> Yeah, but that's a bad idea. This is like diamond inheritance
>>>>> in C++.
>>>>>
>>>>> When you need the same functionality in different backends you
>>>>> implement that as separate object and then add a parent class.
>>>>>
>>>>>> It's the exact same situation here. With struct ttm_resource
>>>>>> you
>>>>>> let *different* implementation flavours subclass it, which
>>>>>> makes it
>>>>>> awkward for the driver to extend the functionality in a
>>>>>> common way
>>>>>> by subclassing, unless the driver only uses a single
>>>>>> implementation.
>>>>> Well the driver should use separate implementations for their
>>>>> different domains as much as possible.
>>>>>
>>>> Hmm, Now you lost me a bit. Are you saying that the way we do
>>>> dynamic
>>>> backends in the struct ttm_buffer_object to facilitate driver
>>>> subclassing is a bad idea or that the RFC with backpointer is a
>>>> bad
>>>> idea?
>>>>
>>>>
>>> Or if you mean diamond inheritance is bad, yes that's basically my
>>> point.
>> That diamond inheritance is a bad idea. What I don't understand is
>> why
>> you need that in the first place?
>>
>> Information that you attach to a resource are specific to the domain
>> where the resource is allocated from. So why do you want to attach
>> the
>> same information to a resources from different domains?
> Again, for the same reason that we do that with struct i915_gem_objects
> and struct ttm_tts, to extend the functionality. I mean information
> that we attach when we subclass a struct ttm_buffer_object doesn't
> necessarily care about whether it's a VRAM or a GTT object. In exactly
> the same way, information that we want to attach to a struct
> ttm_resource doesn't necessarily care whether it's a system or a VRAM
> resource, and need not be specific to any of those.
>
> In this particular case, as memory management becomes asynchronous, you
> can't attach things like sg-tables and gpu binding information to the
> gem object anymore, because the object may have a number of migrations
> in the pipeline. Such things need to be attached to the structure that
> abstracts the memory allocation, and which may have a completely
> different lifetime than the object itself.
>
> In our particular case we want to attach information for cached page
> lookup and and sg-table, and moving forward probably the gpu binding
> (vma) information, and that is the same information for any
> ttm_resource regardless where it's allocated from.
>
> Typical example: A pipelined GPU operation happening before an async
> eviction goes wrong. We need to error capture and reset. But if we look
> at the object for error capturing, it's already updated pointing to an
> after-eviction resource, and the resource sits on a ghost object (or in
> the future when ghost objects go away perhaps in limbo somewhere).
>
> We need to capture the memory pointed to by the struct ttm_resource the
> GPU was referencing, and to be able to do that we need to cache driver-
> specific info on the resource. Typically an sg-list and GPU binding
> information.
>
> Anyway, that cached information needs to be destroyed together with the
> resource and thus we need to be able to access that information from
> the resource in some way, regardless whether it's a pointer or whether
> we embed the struct resource.
>
> I think it's pretty important here that we (using the inheritance
> diagram below) recognize the need for D to inherit from A, just like we
> do for objects or ttm_tts.
>
>
>>> Looking at
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMultiple_inheritance%23%2Fmedia%2FFile%3ADiamond_inheritance.svg&data=04%7C01%7Cchristian.koenig%40amd.com%7C268bb562db8548b285b408d977598b2c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637672048739103176%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bPyDqiSF%2FHFZbl74ux0vfwh3uma5hZIUf2xbzb9yZz8%3D&reserved=0
>>>   
>>>
>>>
>>> 1)
>>>
>>> A would be the struct ttm_resource itself,
>>> D would be struct i915_resource,
>>> B would be struct ttm_range_mgr_node,
>>> C would be struct i915_ttm_buddy_resource
>>>
>>> And we need to resolve the ambiguity using the awkward union
>>> construct, iff we need to derive from both B and C.
>>>
>>> Struct ttm_buffer_object and struct ttm_tt instead have B) and C)
>>> being dynamic backends of A) or a single type derived from A) Hence
>>> the problem doesn't exist for these types.
>>>
>>> So the question from last email remains, if ditching this RFC, can
>>> we
>>> have B) and C) implemented by helpers that can be used from D) and
>>> that don't derive from A?
>> Well we already have that in the form of drm_mm. I mean the
>> ttm_range_manager is just a relatively small glue code which
>> implements
>> the TTMs resource interface using the drm_mm object and a spinlock.
>> IIRC
>> that less than 200 lines of code.
>>
>> So you should already have the necessary helpers and just need to
>> implement the resource manager as far as I can see.
>>
>> I mean I reused the ttm_range_manager_node in for amdgpu_gtt_mgr and
>> could potentially reuse a bit more of the ttm_range_manager code. But
>> I
>> don't see that as much of an issue, the extra functionality there is
>> just minimal.
> Sure but that would give up the prereq of having reusable resource
> manager implementations. What happens if someone would like to reuse
> the buddy manager? And to complicate things even more, the information
> we attach to VRAM resources also needs to be attached to system
> resources. Sure we could probably re-implement a combined system-buddy-
> range manager, but that seems like something overly complex.
>
> The other object examples resolve the diamond inheritance with a
> pointer to the specialization (BC) and let D derive from A.
>
> TTM resources do it backwards. If we can just recognize that and ponder
> what's the easiest way to resolve this given the current design, I
> actually think we'd arrive at a backpointer to allow downcasting from A
> to D.

Yeah, but I think you are approaching that from the wrong side.

For use cases like this I think you should probably have the following 
objects and inheritances:

1. Driver specific objects like i915_sg, i915_vma which don't inherit 
anything from TTM.
2. i915_vram_node which inherits from ttm_resource or a potential 
ttm_buddy_allocator.
3. i915_gtt_node which inherits from ttm_range_manger_node.
4. Maybe i915_sys_node which inherits from ttm_resource as well.

The managers for the individual domains then provide the glue code to 
implement both the TTM resource interface as well as a driver specific 
interface to access the driver objects.

Amdgpu just uses a switch/case for now, but you could as well extend the 
ttm_resource_manager_func table and upcast that inside the driver.

Regards,
Christian.

>
> Thanks,
> Thomas
>
>
>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>
>