unified LRU for ttm and svm

Zeng, Oak oak.zeng at intel.com
Thu Oct 19 16:51:04 UTC 2023


Hello all,

As a follow up to this thread https://www.spinics.net/lists/dri-devel/msg410740.html, I looked further into the idea of a shared LRU list for both ttm/bo and svm (to achieve a mutual eviction b/t them). I came up a rough design which I think better to align with you before I move too far.

As illustrated in below diagram:


  1.  There will be a global drm_lru_manager to maintain the shared LRU list. Each memory type will have a list, i.e., system memory has a list, gpu memory has a list. On system which has multiple gpu memory regions, we can have multiple GPU LRU
  2.  Move the LRU operation functions (such as bulk_move related) from ttm_resource_manager to drm_lru_manager
  3.  Drm_lru_manager should be initialized during device initialization. Ttm layer or svm layer can have weak reference to it for convenience.
  4.  Abstract a drm_lru_entity: This is supposed to be embedded in ttm_resource and svm_resource struct, as illustrated. Since ttm_resource and svm_resource are quite different in nature (ttm_resource is coupled with bo and svm_resource is struct page/pfn based), we can't provide unified eviction function for them. So a evict_func pointer is introduced in drm_lru_entity[Note 1].
  5.  Lru_lock. Currently the lru_lock is in ttm_device structure. Ideally this can be moved to drm_lru_manager. But besides the lru list, lru_lock also protect other ttm specific thing such as ttm_device's pinned list. The current plan is to move lru_lock to xe_device/amdgpu_device and ttm_device or svm can have a weak reference for convenience.

[cid:image001.png at 01DA0285.844FA910]


Note 1: I have been considering a structure like below. Each hmm/svm resource page is backed by a struct page and struct page already has a lru member. So theoretically  the LRU list can be as below. This way we don't need to introduce the drm_lru_entity struct. The difficulty is, without modify the linux struct page, we can't cast a lru node to struct page or struct ttm_resource, since we don't know whether this node is used by ttm or svm. This is why I had to introduce drm_lru_entity to hold an evict_function above. But let me know if you have better idea.

[cid:image002.png at 01DA0289.9AD5D110]

Thanks,
Oak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231019/7749becf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 41996 bytes
Desc: image001.png
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231019/7749becf/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 39944 bytes
Desc: image002.png
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231019/7749becf/attachment-0003.png>


More information about the dri-devel mailing list