[PATCH v4 0/4] TTM unlockable restartable LRU list iteration
Somalapuram, Amaranath
asomalap at amd.com
Fri Mar 8 07:43:55 UTC 2024
Patches are tested on AMD platform.
Repeated stress test on Unigine Heaven, memory full (VRAM + GTT + system
SWAP), then free.
No errors/warning in kernel log.
Any suggestion specific tests?
Regards,
S.Amarnath
On 3/6/2024 12:31 PM, Thomas Hellström wrote:
> This patch-set is a prerequisite for a standalone TTM shrinker
> and for exhaustive TTM eviction using sleeping dma_resv locks,
> which is the motivation for it.
>
> Currently when unlocking the TTM lru list lock, iteration needs
> to be restarted from the beginning, rather from the next LRU list
> node. This can potentially be a big problem, because if eviction
> or shrinking fails for whatever reason after unlock, restarting
> is likely to cause the same failure over and over again.
>
> There are various schemes to be able to continue the list
> iteration from where we left off. One such scheme used by the
> GEM LRU list traversal is to pull items already considered off
> the LRU list and reinsert them when iteration is done.
> This has the drawback that concurrent list iteration doesn't see
> the complete list (which is bad for exhaustive eviction) and also
> doesn't lend itself well to bulk-move sublists since these will
> be split in the process where items from those lists are
> temporarily pulled from the list and moved to the list tail.
>
> The approach taken here is that list iterators insert themselves
> into the list next position using a special list node. Iteration
> is then using that list node as starting point when restarting.
> Concurrent iterators just skip over the special list nodes.
>
> This is implemented in patch 1 and 2.
>
> For bulk move sublist the approach is the same, but when a bulk
> move sublist is moved to the tail, the iterator is also moved,
> causing us to skip parts of the list. That is undesirable.
> Patch 3 deals with that, and when iterator detects it is
> traversing a sublist, it registers with the ttm_lru_bulk_move
> struct using a linked list, and when that bulk move sublist
> is moved to the tail, any iterator registered with it will
> first be moved to the tail of the sublist.
> This is implemented in patch 3.
>
> The restartable property is used in patch 4 to restart swapout if
> needed, but the main purpose is this paves the way for
> shrinker- and exhaustive eviction.
>
> v2:
> - Rework patch 3 completely.
> v3:
> - Fix a NULL pointer dereference found by Xe CI.
> v4:
> - Remove some leftover code causing build problems.
>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: <dri-devel at lists.freedesktop.org>
>
> Thomas Hellström (4):
> drm/ttm: Allow TTM LRU list nodes of different types
> drm/ttm: Use LRU hitches
> drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist
> moves
> drm/ttm: Allow continued swapout after -ENOSPC falure
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +
> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> drivers/gpu/drm/ttm/ttm_device.c | 33 +++-
> drivers/gpu/drm/ttm/ttm_resource.c | 228 ++++++++++++++++++++-----
> drivers/gpu/drm/xe/xe_vm.c | 4 +
> include/drm/ttm/ttm_device.h | 2 +
> include/drm/ttm/ttm_resource.h | 96 +++++++++--
> 7 files changed, 308 insertions(+), 60 deletions(-)
>
More information about the Intel-gfx
mailing list