[Intel-xe] bulk_move in ttm_resource manager

Wed Oct 4 07:17:39 UTC 2023

On Wed, 2023-10-04 at 03:52 +0000, Zeng, Oak wrote:
> Hi Christian,
>  
> As a follow up to this thread:
> https://www.spinics.net/lists/dri-devel/msg410740.html, I started the
> work of moving the lru out of ttm_resource_manager and make it a
> common library for both ttm and svm. While look into the details of
> the bulk_move in ttm resource manager, I found a potential problem:
>  
> For simplicity, let’s say we only have one memory type and one
> priority, so ttm resource manager only maintains one global lru list.
> Let’s say this list has 10 nodes, node1 to node10.
>  
> But the lru_bulk_move is per vm. Let’s say vm1 has a bulk_move
> covering node range [node4, node7] and vm2 has a bulk_move covering
> node range [node6, node9]. Notice those two range has an overlap.
> Since two vm can simultaneously add nodes to lru, I think this
> scenario can happen.
>  
> Now if we perform a bulk move for vm1, moving [node4, node7] to the
> tail of the lru list. The lru after this bulk move will be: node1,
> node2, node3,node8,node9, node10, node4, node5, node6, node7. Now
> notice that for vm2’s bulk_move, the first pointer  (pointing to
> node6) is actually after the last pointer (pointing to node9), which
> doesn’t make sense.
>  
> Is this a real problem? As I understand it, with this issue, we only
> mess up the lru list order, but there won’t be any functional
> problem. If it is a real problem, should we make the bulk_move global
> instead of per vm based?
>  
> Thanks,
> Oak
>  

FWIW I have a patch set that converts the TTM bulk move code to using
sublists; a list item is either a resource or a sublist, and when
performing a bulk move essentially the sublist is moved. Bumping
resource LRU within a VM would touch only the sublist.

Currently functionality and TTM API is essentially the same but when
experimenting with LRU traversal for exhaustive WW-locking eviction
this concept was easier to use. Also hopefully this would reduce
fragility and improve understanding since a scenario like the above
could really never happen...

Let me know if I should send it out as an RFC.

Code is here:
https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/351/commits

/Thomas