[Intel-gfx] [PATCH v5] drm/i915: Introduce refcounted sg-tables

Thomas Hellström thomas.hellstrom at linux.intel.com
Mon Nov 1 15:09:13 UTC 2021


Hi,

On 11/1/21 15:50, Tvrtko Ursulin wrote:
>
> On 01/11/2021 13:51, Thomas Hellström wrote:
>> Hi, Tvrtko
>>
>> On Mon, 2021-11-01 at 13:14 +0000, Tvrtko Ursulin wrote:
>>>
>>> On 01/11/2021 12:24, Thomas Hellström wrote:
>>>> As we start to introduce asynchronous failsafe object migration,
>>>> where we update the object state and then submit asynchronous
>>>> commands we need to record what memory resources are actually used
>>>> by various part of the command stream. Initially for three
>>>> purposes:
>>>>
>>>> 1) Error capture.
>>>> 2) Asynchronous migration error recovery.
>>>> 3) Asynchronous vma bind.
>>>
>>> FWIW something like this may be interesting to me as well, although I
>>> haven't looked much into details yet, for the purpose of allowing
>>> delayed "put pages" via decoupling from the GEM bo.
>>> Two questions after glancing over:
>>>
>>> 1)
>>> I do wonder if abstracting "sgt" away from the name would make sense?
>>> Like perhaps obj->mm.pages being the location of the new abstraction
>>> so
>>> naming it along the lines of i915_obj_pages or something.
>>
>> Well it's not yet clear how this will end up. Really this should
>> develop into something along the lines of "struct i915_async_obj", on
>
> Whole gigantic object struct will be needed for async free or for 
> something more than that?

I guess it depends on how an async free is supposed to work. For the 
async migration, the plan is that when you migrate, for example between 
LMEM and sys, we first unbind async and get a fence that signals when 
unbinding is complete.  The pages sg list will then be updated 
immediately to point to sys, then the old memory in the form of a struct 
ttm_resource will be freed when fences expire. It's on that ttm resource 
we ideally would want the sg-table to sit, but we avoid that ATM due to 
the awkward way those ttm resources were designed. But it's not a 
super-huge object.

>
>> which the sg-list is a member only. Depending on how this turns out and
>> if it remains an sg-list I think your suggestion makes sense, but is it
>> something we can postpone for now?
>
> ...
>
>>
>>>
>>> 2)
>>> And how come obj->mm.pages remains? Does it go away later in follow
>>> up work?
>>
>> For the non-ttm backends, it's not yet implemented, so once they are
>> either moved to TTM or updated, we can completely replace obj-
>>> mm.pages.
>
> ... sure, it's your project. I assume there is some time pressure then. 

Yes, initially.

> I was just asking since it looked a bit outside of the usual patterns 
> on a glance.
>
> Oh one more question, how will it work for objects which migrate 
> between system and local memory? Depending on current placement either 
> obj->mm.pages or obj->mm.rsgt will be valid?

The contract currently is that obj->mm.pages is *always* valid. 
Sometimes it points to the sg_table embedded in obj->mm.rsgt.

For anything that requires awareness of async migration, like upcoming 
vma resources and error capture, they also need to be aware of 
obj->mm.rsgt and handle refcounting accordingly. If it's NULL they can 
safely assume async migration is not happening.

/Thomas



>
> Regards,
>
> Tvrtko


More information about the Intel-gfx mailing list