[Intel-xe] ttm_bo and multiple backing store segments

Fri Aug 4 00:19:08 UTC 2023

Finally returning to this, thanks for the replies.

On 7/19/2023 2:02 AM, Christian König wrote:
> Hi guys,
> 
> massive sorry for the delayed response, this mail felt totally through 
> my radar without being noticed.
> 
> Am 17.07.23 um 19:24 schrieb Rodrigo Vivi:
>> On Thu, Jun 29, 2023 at 02:10:58PM -0700, Welty, Brian wrote:
>>> Hi Christian / Thomas,
>>>
>>> Wanted to ask if you have explored or thought about adding support in 
>>> TTM
>>> such that a ttm_bo could have more than one underlying backing store 
>>> segment
>>> (that is, to have a tree of ttm_resources)?
> 
> We already use something similar on amdgpu where basically the VRAM 
> resources are stitched together from multiple backing pages.
> 
> That is not exactly the same, but it comes close.

I tried searching for awhile for this in amdgpu but wasn't able to find 
it.  Didn't see any signs in amdgpu_vram_mgr.c.
Can you point me to where this code lives?  I wanted to review and 
compare the approach...

> 
>>> We are considering to support such BOs for Intel Xe driver.
>> They are indeed the best one to give an opinion here.
>> I just have some dummy questions and comments below.
>>
>>> Some of the benefits:
>>>   * devices with page fault support can fault (and migrate) backing 
>>> store
>>>     at finer granularity than the entire BO
> 
> We've considered that once as well and I even started hacking something 
> together, but the problem was that at least at that point it wasn't 
> doable because of limitations in the Linux memory management.
> 
> Basically the extended attributes used to control caching of pages where 
> only definable per VMA! So when one piece of the BO would have been in 
> uncached VRAM while another piece would be in cached system system 
> memory you immediately ran into problems.
> 
> I think that issue is fixed by now, but I'm not 100% sure.

Okay, thanks for mentioning.  I didn't come across such issue so far...

> 
> In general I think it might be beneficial, but I'm not 100% sure if it's 
> worth the additional complexity.

Agreed.  Well, up next is to put small RFC together then...

> 
> Regards,
> Christian.
> 
>> what advantage does this bring? to each workload?
>> is it a performance on huge bo?

Replying to Rodrigo's comments for the rest here...
Yes, providing more rationale is needed. I'll see about beefing up
the description with the RFC patches...
Bascially, all aspects of working with BO backing store can operate
on smaller granularity.
Including being able to support a BO which is larger than total VRAM.

>>
>>>   * BOs can support having multiple backing store segments, which can be
>>>     in different memory domains/regions
>> what locking challenges would this bring?

Intent would be to still have locking done at the BO level, and not 
attempt to introduce finer grained locking.

>> is this more targeting gpu + cpu? or only for our multi-tile platforms?
>> and what's the advantage this is bringing to real use cases?

Right, it's able to be leveraged for both types of usage you mentioned.
So with both gpu + cpu accessing a BO, the portion of the BO they are 
accessing can be placed locally.
And with an Xe gt0 + gt1 accessing a BO, we can place segments of it in 
the tile local to the gt.

>> (probably the svm/hmm question below answers my questions, but...)
>>
>>>   * BO eviction could operate on smaller granularity than entire BO
>> I believe all the previous doubts apply to this item as well...

Not sure what 'all the previous doubts' refers to...
Agree most of the value is lost if eviction is not updated to operate at 
finer granularity.  Will make sure to explore this.

>>
>>> Or is the thinking that workloads should use SVM/HMM instead of 
>>> GEM_CREATE
>>> if they want above benefits?
>>>
>>> Is this something you are open to seeing an RFC series that starts 
>>> perhaps
>>> with just extending ttm_bo_validate() to see how this might shape up?
>> Imho the RFC always help... a piece of code to see the idea usually draws
>> more attention from devs than ask in text mode. But more text explaining
>> the reasons behind are also helpful even with the RFC.

Will work up a small RFC and see where we go with this...

Thanks,
-Brian

>>
>> Thanks,
>> Rodrigo.
>>
>>> -Brian
>