[Intel-xe] bulk_move in ttm_resource manager

Christian König christian.koenig at amd.com
Wed Oct 4 12:44:56 UTC 2023


Am 04.10.23 um 09:17 schrieb Thomas Hellström:
> On Wed, 2023-10-04 at 03:52 +0000, Zeng, Oak wrote:
>> Hi Christian,
>>   
>> As a follow up to this thread:
>> https://www.spinics.net/lists/dri-devel/msg410740.html, I started the
>> work of moving the lru out of ttm_resource_manager and make it a
>> common library for both ttm and svm. While look into the details of
>> the bulk_move in ttm resource manager, I found a potential problem:
>>   
>> For simplicity, let’s say we only have one memory type and one
>> priority, so ttm resource manager only maintains one global lru list.
>> Let’s say this list has 10 nodes, node1 to node10.
>>   
>> But the lru_bulk_move is per vm. Let’s say vm1 has a bulk_move
>> covering node range [node4, node7] and vm2 has a bulk_move covering
>> node range [node6, node9]. Notice those two range has an overlap.
>> Since two vm can simultaneously add nodes to lru, I think this
>> scenario can happen.

That can't happen. See what ttm_resource_move_to_lru_tail() does when 
the BO has a bulk move associated with it.

>>   
>> Now if we perform a bulk move for vm1, moving [node4, node7] to the
>> tail of the lru list. The lru after this bulk move will be: node1,
>> node2, node3,node8,node9, node10, node4, node5, node6, node7. Now
>> notice that for vm2’s bulk_move, the first pointer  (pointing to
>> node6) is actually after the last pointer (pointing to node9), which
>> doesn’t make sense.
>>   
>> Is this a real problem? As I understand it, with this issue, we only
>> mess up the lru list order, but there won’t be any functional
>> problem. If it is a real problem, should we make the bulk_move global
>> instead of per vm based?
>>   
>> Thanks,
>> Oak
>>   
> FWIW I have a patch set that converts the TTM bulk move code to using
> sublists; a list item is either a resource or a sublist, and when
> performing a bulk move essentially the sublist is moved. Bumping
> resource LRU within a VM would touch only the sublist.

That sounds like my very first attempt at bulk moves which we abandoned 
for various reasons.

That's easily >5years ago, but the history of that should still be on 
the mailing list if I'm not completely mistaken.

Regards,
Christian.

>
> Currently functionality and TTM API is essentially the same but when
> experimenting with LRU traversal for exhaustive WW-locking eviction
> this concept was easier to use. Also hopefully this would reduce
> fragility and improve understanding since a scenario like the above
> could really never happen...
>
> Let me know if I should send it out as an RFC.
>
> Code is here:
> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/351/commits
>
> /Thomas
>
>
>
>
>



More information about the Intel-xe mailing list