[PATCH 00/11] Introduce drm evictable lru
Thomas Hellström
thomas.hellstrom at linux.intel.com
Thu Dec 21 13:12:14 UTC 2023
Hi Oak, Christian
On 11/2/23 05:32, Oak Zeng wrote:
> We plan to implement xe driver's shared virtual memory
> manager (aka SVM) without buffer object concept. This
> means we won't build our shared virtual memory manager
> upon TTM infrastructure like amdgpu does.
>
> Even though this approach is more efficient, it does
> create a problem for memory eviction when there is
> memory pressure: memory allocated by SVM and memory
> allocated by TTM should be able to mutually evict
> from each other. TTM's resource manager maintains
> a LRU list for each memory type and this list is used
> to pick up the memory eviction victim. Since we don't
> use TTM for SVM implementation, SVM allocated memory
> can't be added to TTM resource manager's LRU list. Thus
> SVM allocated memory and TTM allocated memory are not
> mutually evictable.
>
> See more discussion on this topic here:
> https://www.spinics.net/lists/dri-devel/msg410740.html
>
> This series solve this problem by creating a shared
> LRU list b/t SVM and TTM, or any other resource manager.
>
> The basic idea is, abstract a drm_lru_entity structure
> which is supposed to be embedded in ttm_resource structure,
> or any other resource manager. The resource LRU list is a
> list of drm_lru_entity. drm_lru_entity has eviction function
> pointers which can be used to call back drivers' specific
> eviction function to evict a memory resource.
>
> Introduce global drm_lru_manager to struct drm_device
> to manage LRU lists. Each memory type or memory region
> can have a LRU list. TTM resource manager's LRU list functions
> including bulk move functions are moved to drm lru manager.
> drm lru manager provides a evict_first function to evict
> the first memory resource from LRU list. This function can
> be called from TTM, SVM or any other resource manager, so
> all the memory allocated in the drm sub-system can be mutually
> evicted.
>
> The lru_lock is also moved from struct ttm_device to struct
> drm_device.
>
> Opens:
> 1) memory accounting: currently the ttm resource manager's
> memory accounting functions is kept at ttm resource manager.
> Since memory accounting should be cross TTM and SVM, it should
> be ideally in the drm lru manager layer. This will be polished
> in the future.
>
> 2) eviction callback function interface: The current eviction
> function interface is designed to meet TTM memory eviction
> requirements. When SVM is in the picture, this interface
> need to be futher tunned to meet SVM requirement also.
>
> This series is not tested and it is only compiled for xe
> driver. Some minor changes are needed for other driver
> such as amdgpu, nouveau etc. I intended to send this out
> as a request for comment series to get some early feedback,
> to see whether this is the right direction to go. I will
> futher polish this series after a direction is agreed.
>
> Oak Zeng (11):
> drm/ttm: re-parameter ttm_device_init
> drm: move lru_lock from ttm_device to drm_device
> drm: introduce drm evictable LRU
> drm: Add evict function pointer to drm lru entity
> drm: Replace ttm macros with drm macros
> drm/ttm: Set lru manager to ttm resource manager
> drm/ttm: re-parameterize a few ttm functions
> drm: Initialize drm lru manager
> drm/ttm: Use drm LRU manager iterator
> drm/ttm: Implement ttm memory evict functions
> drm/ttm: Write ttm functions using drm lru manager functions
>
> drivers/gpu/drm/Makefile | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 6 +
> .../gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c | 6 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 10 +-
> drivers/gpu/drm/drm_drv.c | 1 +
> drivers/gpu/drm/drm_evictable_lru.c | 266 ++++++++++++++++++
> drivers/gpu/drm/drm_gem_vram_helper.c | 10 +-
> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 +-
> drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 10 +
> drivers/gpu/drm/i915/intel_region_ttm.c | 4 +-
> drivers/gpu/drm/i915/selftests/mock_region.c | 2 +-
> drivers/gpu/drm/loongson/lsdc_ttm.c | 10 +-
> drivers/gpu/drm/nouveau/nouveau_ttm.c | 22 +-
> drivers/gpu/drm/qxl/qxl_ttm.c | 6 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 10 +-
> drivers/gpu/drm/ttm/tests/ttm_device_test.c | 2 +-
> drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 2 +-
> drivers/gpu/drm/ttm/ttm_bo.c | 247 ++++++++++++----
> drivers/gpu/drm/ttm/ttm_bo_util.c | 20 +-
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
> drivers/gpu/drm/ttm/ttm_device.c | 55 ++--
> drivers/gpu/drm/ttm/ttm_module.h | 3 +-
> drivers/gpu/drm/ttm/ttm_range_manager.c | 14 +-
> drivers/gpu/drm/ttm/ttm_resource.c | 242 +++-------------
> drivers/gpu/drm/ttm/ttm_sys_manager.c | 8 +-
> drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 2 +-
> drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 2 +-
> drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 +-
> .../gpu/drm/vmwgfx/vmwgfx_system_manager.c | 6 +
> drivers/gpu/drm/xe/xe_bo.c | 48 ++--
> drivers/gpu/drm/xe/xe_bo.h | 5 +-
> drivers/gpu/drm/xe/xe_device.c | 2 +-
> drivers/gpu/drm/xe/xe_dma_buf.c | 4 +-
> drivers/gpu/drm/xe/xe_exec.c | 6 +-
> drivers/gpu/drm/xe/xe_migrate.c | 6 +-
> drivers/gpu/drm/xe/xe_res_cursor.h | 10 +-
> drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 8 +-
> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 18 +-
> drivers/gpu/drm/xe/xe_vm.c | 6 +-
> drivers/gpu/drm/xe/xe_vm_types.h | 2 +-
> include/drm/drm_device.h | 12 +
> include/drm/drm_evictable_lru.h | 260 +++++++++++++++++
> include/drm/ttm/ttm_bo.h | 10 +-
> include/drm/ttm/ttm_device.h | 13 +-
> include/drm/ttm/ttm_range_manager.h | 17 +-
> include/drm/ttm/ttm_resource.h | 117 +++-----
> 49 files changed, 1042 insertions(+), 501 deletions(-)
> create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
> create mode 100644 include/drm/drm_evictable_lru.h
>
Some additional comments after looking through this patchset and the
comments.
1) General: IMO a good start.
2) Where should this reside? I'm not a big fan of *prescribing* that a
component should be part of a specific device structure. IMO that leads
to dumping the whole drm namespace into this component rather than to
keep it isolated with as few dependencies as possible. I would make the
part that should reside in the drm device a "struct drm_lru_device", and
the resource base class a "struct drm_lru_resource". Whether the drm
device then wants to embed the struct drm_lru_device (nice if you need
this component) or have a pointer to it (nice if you don't need this
component) and whether the ttm_device should subclass the drm device
becomes pretty orthogonal to the actual implementation, and we'd avoid
hard to follow code cross-references all over the place. The drm_gpuvm
IMO did a good job with this, but I know there are other opinions on
this, and I don't want it to become a blocker.
3) Christian's comment about cursors into the LRU for LRU traversal restart
This is quickly becoming a must for the Xe driver, and a very-nice to
have for a working TTM shrinker, but I think we should start to bring
what we have in TTM currently over and do the cursors as a separate
task. I have it pretty much on top of my priority list currently. Both
downstream i915 and drm_gpuvm have a way to handle this nicely by
permutating the LRU list just before the lock needs to be released, so I
was thinking something similar. The complication would be the bulk move
tracking.
4) The struct drm_lru_evict_ctx - This one needs to be common across all
users since it's set up by the evictor and used by the evictee ops.
/Thomas
More information about the Intel-xe
mailing list