[PATCH 00/11] Introduce drm evictable lru

Thomas Hellström thomas.hellstrom at linux.intel.com
Thu Dec 21 13:12:14 UTC 2023


Hi Oak, Christian

On 11/2/23 05:32, Oak Zeng wrote:
> We plan to implement xe driver's shared virtual memory
> manager (aka SVM) without buffer object concept. This
> means we won't build our shared virtual memory manager
> upon TTM infrastructure like amdgpu does.
>
> Even though this approach is more efficient, it does
> create a problem for memory eviction when there is
> memory pressure: memory allocated by SVM and memory
> allocated by TTM should be able to mutually evict
> from each other. TTM's resource manager maintains
> a LRU list for each memory type and this list is used
> to pick up the memory eviction victim. Since we don't
> use TTM for SVM implementation, SVM allocated memory
> can't be added to TTM resource manager's LRU list. Thus
> SVM allocated memory and TTM allocated memory are not
> mutually evictable.
>
> See more discussion on this topic here:
> https://www.spinics.net/lists/dri-devel/msg410740.html
>
> This series solve this problem by creating a shared
> LRU list b/t SVM and TTM, or any other resource manager.
>
> The basic idea is, abstract a drm_lru_entity structure
> which is supposed to be embedded in ttm_resource structure,
> or any other resource manager. The resource LRU list is a
> list of drm_lru_entity. drm_lru_entity has eviction function
> pointers which can be used to call back drivers' specific
> eviction function to evict a memory resource.
>
> Introduce global drm_lru_manager to struct drm_device
> to manage LRU lists. Each memory type or memory region
> can have a LRU list. TTM resource manager's LRU list functions
> including bulk move functions are moved to drm lru manager.
> drm lru manager provides a evict_first function to evict
> the first memory resource from LRU list. This function can
> be called from TTM, SVM or any other resource manager, so
> all the memory allocated in the drm sub-system can be mutually
> evicted.
>
> The lru_lock is also moved from struct ttm_device to struct
> drm_device.
>
> Opens:
> 1) memory accounting: currently the ttm resource manager's
> memory accounting functions is kept at ttm resource manager.
> Since memory accounting should be cross TTM and SVM, it should
> be ideally in the drm lru manager layer. This will be polished
> in the future.
>
> 2) eviction callback function interface: The current eviction
> function interface is designed to meet TTM memory eviction
> requirements. When SVM is in the picture, this interface
> need to be futher tunned to meet SVM requirement also.
>
> This series is not tested and it is only compiled for xe
> driver. Some minor changes are needed for other driver
> such as amdgpu, nouveau etc. I intended to send this out
> as a request for comment series to get some early feedback,
> to see whether this is the right direction to go. I will
> futher polish this series after a direction is agreed.
>
> Oak Zeng (11):
>    drm/ttm: re-parameter ttm_device_init
>    drm: move lru_lock from ttm_device to drm_device
>    drm: introduce drm evictable LRU
>    drm: Add evict function pointer to drm lru entity
>    drm: Replace ttm macros with drm macros
>    drm/ttm: Set lru manager to ttm resource manager
>    drm/ttm: re-parameterize a few ttm functions
>    drm: Initialize drm lru manager
>    drm/ttm: Use drm LRU manager iterator
>    drm/ttm: Implement ttm memory evict functions
>    drm/ttm: Write ttm functions using drm lru manager functions
>
>   drivers/gpu/drm/Makefile                      |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   6 +
>   .../gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c   |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   6 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  10 +-
>   drivers/gpu/drm/drm_drv.c                     |   1 +
>   drivers/gpu/drm/drm_evictable_lru.c           | 266 ++++++++++++++++++
>   drivers/gpu/drm/drm_gem_vram_helper.c         |  10 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   6 +-
>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  10 +
>   drivers/gpu/drm/i915/intel_region_ttm.c       |   4 +-
>   drivers/gpu/drm/i915/selftests/mock_region.c  |   2 +-
>   drivers/gpu/drm/loongson/lsdc_ttm.c           |  10 +-
>   drivers/gpu/drm/nouveau/nouveau_ttm.c         |  22 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c                 |   6 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c           |  10 +-
>   drivers/gpu/drm/ttm/tests/ttm_device_test.c   |   2 +-
>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |   2 +-
>   drivers/gpu/drm/ttm/ttm_bo.c                  | 247 ++++++++++++----
>   drivers/gpu/drm/ttm/ttm_bo_util.c             |  20 +-
>   drivers/gpu/drm/ttm/ttm_bo_vm.c               |   2 +-
>   drivers/gpu/drm/ttm/ttm_device.c              |  55 ++--
>   drivers/gpu/drm/ttm/ttm_module.h              |   3 +-
>   drivers/gpu/drm/ttm/ttm_range_manager.c       |  14 +-
>   drivers/gpu/drm/ttm/ttm_resource.c            | 242 +++-------------
>   drivers/gpu/drm/ttm/ttm_sys_manager.c         |   8 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_bo.c            |   2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_bo.h            |   2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |   6 +-
>   .../gpu/drm/vmwgfx/vmwgfx_system_manager.c    |   6 +
>   drivers/gpu/drm/xe/xe_bo.c                    |  48 ++--
>   drivers/gpu/drm/xe/xe_bo.h                    |   5 +-
>   drivers/gpu/drm/xe/xe_device.c                |   2 +-
>   drivers/gpu/drm/xe/xe_dma_buf.c               |   4 +-
>   drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
>   drivers/gpu/drm/xe/xe_migrate.c               |   6 +-
>   drivers/gpu/drm/xe/xe_res_cursor.h            |  10 +-
>   drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   8 +-
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |  18 +-
>   drivers/gpu/drm/xe/xe_vm.c                    |   6 +-
>   drivers/gpu/drm/xe/xe_vm_types.h              |   2 +-
>   include/drm/drm_device.h                      |  12 +
>   include/drm/drm_evictable_lru.h               | 260 +++++++++++++++++
>   include/drm/ttm/ttm_bo.h                      |  10 +-
>   include/drm/ttm/ttm_device.h                  |  13 +-
>   include/drm/ttm/ttm_range_manager.h           |  17 +-
>   include/drm/ttm/ttm_resource.h                | 117 +++-----
>   49 files changed, 1042 insertions(+), 501 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
>   create mode 100644 include/drm/drm_evictable_lru.h
>
Some additional comments after looking through this patchset and the 
comments.

1) General: IMO a good start.

2) Where should this reside? I'm not a big fan of *prescribing* that a 
component should be part of a specific device structure. IMO that leads 
to dumping the whole drm namespace into this component rather than to 
keep it isolated with as few dependencies as possible. I would make the 
part that should reside in the drm device a "struct drm_lru_device", and 
the resource base class a "struct drm_lru_resource". Whether the drm 
device then wants to embed the struct drm_lru_device (nice if you need 
this component) or have a pointer to it (nice if you don't need this 
component) and whether the ttm_device should subclass the drm device 
becomes pretty orthogonal to the actual implementation, and we'd avoid 
hard to follow code cross-references all over the place. The drm_gpuvm 
IMO did a good job with this, but I know there are other opinions on 
this, and I don't want it to become a blocker.

3) Christian's comment about cursors into the LRU for LRU traversal restart
This is quickly becoming a must for the Xe driver, and a very-nice to 
have for a working TTM shrinker, but I think we should start to bring 
what we have in TTM currently over and do the cursors as a separate 
task. I have it pretty much on top of my priority list currently. Both 
downstream i915 and drm_gpuvm have a way to handle this nicely by 
permutating the LRU list just before the lock needs to be released, so I 
was thinking something similar. The complication would be the bulk move 
tracking.

4) The struct drm_lru_evict_ctx - This one needs to be common across all 
users since it's set up by the evictor and used by the evictee ops.

/Thomas




More information about the Intel-xe mailing list