[RFC 03/11] drm: introduce drm evictable LRU

Zeng, Oak oak.zeng at intel.com
Fri Nov 3 04:04:39 UTC 2023



> -----Original Message-----
> From: Christian König <christian.koenig at amd.com>
> Sent: Thursday, November 2, 2023 9:24 AM
> To: Zeng, Oak <oak.zeng at intel.com>; dri-devel at lists.freedesktop.org; intel-
> xe at lists.freedesktop.org
> Cc: Thomas.Hellstrom at linux.intel.com; felix.kuehling at amd.com;
> airlied at gmail.com; Welty, Brian <brian.welty at intel.com>
> Subject: Re: [RFC 03/11] drm: introduce drm evictable LRU
> 
> Am 02.11.23 um 05:32 schrieb Oak Zeng:
> > drm LRU manager is introuced for resource eviction purpose. It maintains
> > a LRU list per resource type.
> 
> Shouldn't we first add the possible resource types in a separate patch?

Resource type in my description message is not a good name. for resource type we have:
System memory
Gpu stolen memory
Normal gpu vram

Some device such as Intel's pvc has sub-device concept (it is called tile in xe driver). For such device, we create multiple vram type ttm resource manager and multiple lru manager, one for each sub-device...So currently we only defined a DRM_NUM_MEM_TYPES in lru manager, but not defining each resource (memory) type. 
> 
> >   It provides functions to add or remove
> > resource to or from the list. It also provides function to retrieve the
> > first entity on the LRU list.
> 
> + functions to iterate over them.

Yes basic iterate functions are implemented in this patch. Will add it to description message.
> 
> >
> > drm LRU manager also provides functions for bulk moving resources
> > on the LRU lists.
> >
> > drm LRU manager also does very basic memory accounting function, i.e.,
> > LRU manager keeps a size of this resource type and a usage member
> > for how much of resource has been added to this LRU manager's LRU
> > list. TTM resource manager memory accounting functoins such as
> > struct ttm_resource_manager::size and struct ttm_resource_manger::usage
> > are still kept. In the future, when SVM codes are in the picture,
> > those memory accounting functions need some rework to consider
> > the memory used by both TTM and SVM.
> 
> Please keep in mind that this structure needs to extremely small to be
> usable for SVM. E.g. struct page size small :)
> 
> At least HMM based implementations ideally wants to have one for each
> page or something like that.

Very good point. List node and eviction function pointer are necessary for drm_lru_entity. I will look whether we can remove other members. At least we can remove the drm_device pointer if we make drm_device base class of ttm_device as you suggested in previous patch comment.

And mem_type and priority can use bitfield, so a dword is enough.


> 
> > For one device, a global drm LRU manager per resource type should be
> > created/initialized at device initialization time. Drm LRU manager
> > instances are embedded in struct drm_device.
> >
> > It is pretty much moving some of the ttm resource manager functions
> > to the drm layer. The reason of this code refactory is, we want to
> > create a single LRU list for memory allocated from BO(buffer object)
> > based driver and hmm/svm(shared virtual memory) based driver, thus BO
> > driver and svm driver can evict memory from each other.
> >
> > Previously the LRU list in TTM resource manager (lru field in struct
> > ttm_reource_manager) is coupled with ttm_buffer_object concept, i.e.,
> > each ttm resource is backed by a ttm_buffer_object and the LRU list
> > is essentially a list of ttm_buffer_object.
> 
> Actually it's the other way around. The resource provides the backing of
> the BO.

You are right. Will fix this description.
> 
> And when a BO moves around it can temporary be that multiple resource
> point to the same BO.
> 
> I also want to have a more advanced iterator at some point where we grab
> the BO lock for keeping a reference into the LRU list. Not sure how to
> do this if we don't have the BO here any more.
> 
> Need to think about that further,

Don't quite get the what you want to do with the advanced iterator. But with this work, the lru entity is a base class of ttm_resource or any other resource struct in hmm/svm. Lru is decoupled from bo concept - this is why this lru can be shared with svm code which is bo-less.

Oak 

> Christian.
> 
> >   Due to this behavior, the
> > TTM resource manager can't be used by hmm/svm driver as we don't plan
> > to have the BO concept for the hmm/svm implemenation. So we decouple
> > the evictable LRU list from the BO concept in this series.
> >
> > The design goal of drm lru manager is to make it as lean as possible.
> > So each lru entity only has a list node member used to link this entity
> > to the evictable LRU list, and the basic resource size/type/priority
> > of this entity. It doesn't have any driver specify information. A lru
> > entity also has a function pointer of evict function. This is used to
> > implement ttm or svm specific eviction function. A lru entity is supposed
> > to be embedded in a driver specific structure such as struct
> > ttm_resource, see the usage in the next patch of this series.
> >
> > The ttm resource manager, and some of the ttm_bo functions such as
> > ttm_mem_evict_first will be rewriten using the new drm lru manager
> > library, see the next patch in this series.
> >
> > The future hmm/svm implemenation will call lru manager function to add
> > hmm/svm allocations to the shared evictable lru list.
> >
> > Lock design: previously ttm_resource LRU list is protected by a device
> > global ttm_device::lru_lock (bdev->lru_lock in codes). This lock also
> > protects ttm_buffer_object::pin_count, ttm_resource_manager::usage,
> > ttm_resource::bo, ttm_device::pinned list etc. With this refactory,
> > lru_lock is moved out of ttm_device and is added to struct drm_deive, so
> > it can be shared b/t ttm code and svm code.
> >
> > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > ---
> >   drivers/gpu/drm/Makefile            |   1 +
> >   drivers/gpu/drm/drm_evictable_lru.c | 232 ++++++++++++++++++++++++++++
> >   include/drm/drm_device.h            |   7 +
> >   include/drm/drm_evictable_lru.h     | 188 ++++++++++++++++++++++
> >   4 files changed, 428 insertions(+)
> >   create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
> >   create mode 100644 include/drm/drm_evictable_lru.h
> >
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index 1ad88efb1752..13953b0d271b 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -46,6 +46,7 @@ drm-y := \
> >   	drm_vblank_work.o \
> >   	drm_vma_manager.o \
> >   	drm_gpuva_mgr.o \
> > +	drm_evictable_lru.o \
> >   	drm_writeback.o
> >   drm-$(CONFIG_DRM_LEGACY) += \
> >   	drm_agpsupport.o \
> > diff --git a/drivers/gpu/drm/drm_evictable_lru.c
> b/drivers/gpu/drm/drm_evictable_lru.c
> > new file mode 100644
> > index 000000000000..2ba9105cca03
> > --- /dev/null
> > +++ b/drivers/gpu/drm/drm_evictable_lru.c
> > @@ -0,0 +1,232 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2023 Intel Corporation
> > + */
> > +
> > +#include <linux/lockdep.h>
> > +#include <linux/container_of.h>
> > +#include <drm/drm_evictable_lru.h>
> > +#include <drm/drm_device.h>
> > +
> > +static inline struct drm_lru_mgr *entity_to_mgr(struct drm_lru_entity *entity)
> > +{
> > +	struct drm_lru_mgr *mgr;
> > +
> > +	mgr = &entity->drm->lru_mgr[entity->mem_type];
> > +	BUG_ON(!mgr->used);
> > +
> > +	return mgr;
> > +}
> > +
> > +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> > +			uint32_t mem_type, uint64_t size, uint32_t priority)
> > +{
> > +	entity->drm = drm;
> > +	entity->mem_type = mem_type;
> > +	entity->size = size;
> > +	entity->priority = priority;
> > +	INIT_LIST_HEAD(&entity->lru);
> > +}
> > +
> > +/**
> > + * drm_lru_mgr_init
> > + *
> > + * @mgr: drm lru manager to init
> > + * @size: size of the resource managed by this manager
> > + * @lock: pointer of the global lru_lock
> > + *
> > + * Initialize a drm lru manager
> > + */
> > +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size, spinlock_t *lock)
> > +{
> > +	unsigned j;
> > +
> > +	mgr->used = true;
> > +	mgr->size = size;
> > +	mgr->usage = 0;
> > +	mgr->lru_lock = lock;
> > +
> > +	for(j = 0; j < DRM_MAX_LRU_PRIORITY; j++)
> > +		INIT_LIST_HEAD(&mgr->lru[j]);
> > +}
> > +
> > +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move)
> > +{
> > +	memset(bulk_move, 0, sizeof(*bulk_move));
> > +}
> > +
> > +/**
> > + * drm_lru_first
> > + *
> > + * @mgr: drm lru manager to iterate over
> > + * @cursor: cursor of the current position
> > + *
> > + * Returns the first entity in drm lru manager
> > + */
> > +struct drm_lru_entity *
> > +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor)
> > +{
> > +	struct drm_lru_entity *entity;
> > +
> > +	lockdep_assert_held(mgr->lru_lock);
> > +
> > +	for(cursor->priority = 0; cursor->priority < DRM_MAX_LRU_PRIORITY;
> ++cursor->priority)
> > +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> > +			return entity;
> > +
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * drm_lru_next
> > + *
> > + * @mgr: drm lru manager to iterate over
> > + * @cursor: cursor of the current position
> > + * @entity: the current lru entity pointer
> > + *
> > + * Returns the next entity from drm lru manager
> > + */
> > +struct drm_lru_entity *
> > +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> > +		struct drm_lru_entity *entity)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +
> > +	list_for_each_entry_continue(entity, &mgr->lru[cursor->priority], lru)
> > +		return entity;
> > +
> > +	for(++cursor->priority; cursor->priority < DRM_MAX_LRU_PRIORITY;
> ++cursor->priority)
> > +		list_for_each_entry(entity, &mgr->lru[cursor->priority], lru)
> > +			return entity;
> > +
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * drm_lru_move_to_tail
> > + *
> > + * @entity: the lru entity to move to lru tail
> > + *
> > + * Move a lru entity to lru tail
> > + */
> > +void drm_lru_move_to_tail(struct drm_lru_entity * entity)
> > +{
> > +	struct list_head *lru;
> > +	struct drm_lru_mgr *mgr;
> > +
> > +	mgr = entity_to_mgr(entity);
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	lru = &mgr->lru[entity->priority];
> > +	list_move_tail(&entity->lru, lru);
> > +}
> > +
> > +/**
> > + * drm_lru_bulk_move_range_tail
> > + *
> > + * @range: bulk move range
> > + * @entity: lru_entity to move
> > + *
> > + * Move a lru_entity to the tail of a bulk move range
> > + */
> > +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> > +									struct
> drm_lru_entity *entity)
> > +{
> > +	if (entity == range->last)
> > +		return;
> > +
> > +	if (entity == range->first)
> > +		range->first = container_of(entity->lru.next, struct drm_lru_entity,
> lru);
> > +
> > +	if (range->last)
> > +		list_move(&entity->lru, &range->last->lru);
> > +
> > +	range->last = entity;
> > +}
> > +EXPORT_SYMBOL(drm_lru_bulk_move_range_tail);
> > +
> > +/**
> > + * drm_lru_bulk_move_tail - bulk move range of entities to the LRU tail.
> > + *
> > + * @bulk: bulk_move structure
> > + *
> > + * Bulk move entities to the LRU tail, only valid to use when driver makes sure that
> > + * resource order never changes.
> > + */
> > +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk)
> > +{
> > +
> > +	unsigned i, j;
> > +
> > +	for (i = 0; i < DRM_NUM_MEM_TYPES; ++i) {
> > +		for (j = 0; j < DRM_MAX_LRU_PRIORITY; ++j) {
> > +			struct drm_lru_bulk_move_range *range = &bulk-
> >range[i][j];
> > +			struct drm_lru_mgr *mgr;
> > +
> > +			if (!range->first)
> > +				continue;
> > +
> > +			mgr = entity_to_mgr(range->first);
> > +			lockdep_assert_held(mgr->lru_lock);
> > +			list_bulk_move_tail(&mgr->lru[range->first->priority],
> &range->first->lru,
> > +					&range->last->lru);
> > +		}
> > +	}
> > +}
> > +EXPORT_SYMBOL(drm_lru_bulk_move_tail);
> > +
> > +/**
> > + * drm_lru_add_bulk_move
> > + *
> > + * @entity: the lru entity to add to the bulk move range
> > + * @bulk_move: the bulk move ranges to add the entity
> > + *
> > + * Add a lru entity to the tail of a bulk move range
> > + */
> > +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> > +						struct drm_lru_bulk_move
> *bulk_move)
> > +{
> > +	struct drm_lru_bulk_move_range *range;
> > +
> > +	range = &bulk_move->range[entity->mem_type][entity->priority];
> > +
> > +	if (!range->first) {
> > +		range->first = entity;
> > +		range->last = entity;
> > +		return;
> > +	}
> > +
> > +	drm_lru_bulk_move_range_tail(range, entity);
> > +}
> > +
> > +EXPORT_SYMBOL(drm_lru_add_bulk_move);
> > +/**
> > + * drm_lru_del_bulk_move
> > + *
> > + * @entity: the lru entity to move from the bulk move range
> > + * @bulk_move: the bulk move ranges to move the entity out of
> > + *
> > + * Move a lru entity out of bulk move range. This doesn't
> > + * delete entity from lru manager's lru list.
> > + */
> > +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> > +					struct drm_lru_bulk_move *bulk_move)
> > +{
> > +	struct drm_lru_bulk_move_range *range;
> > +
> > +	range = &bulk_move->range[entity->mem_type][entity->priority];
> > +
> > +	if (unlikely(WARN_ON(!range->first || !range->last) ||
> > +			(range->first == entity && range->last == entity))) {
> > +		range->first = NULL;
> > +		range->last = NULL;
> > +	} else if (range->first == entity) {
> > +		range->first = container_of(entity->lru.next,
> > +				struct drm_lru_entity, lru);
> > +	} else if (range->last == entity) {
> > +		range->last = container_of(entity->lru.prev,
> > +				struct drm_lru_entity, lru);
> > +	} else {
> > +		list_move(&entity->lru, &range->last->lru);
> > +	}
> > +}
> > +EXPORT_SYMBOL(drm_lru_del_bulk_move);
> > diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> > index d0b5f42786be..1bdcd34d3f6b 100644
> > --- a/include/drm/drm_device.h
> > +++ b/include/drm/drm_device.h
> > @@ -8,6 +8,7 @@
> >
> >   #include <drm/drm_legacy.h>
> >   #include <drm/drm_mode_config.h>
> > +#include <drm/drm_evictable_lru.h>
> >
> >   struct drm_driver;
> >   struct drm_minor;
> > @@ -331,6 +332,12 @@ struct drm_device {
> >   	 */
> >   	spinlock_t lru_lock;
> >
> > +	/**
> > +	 * @lru_mgr: Device global lru managers per memory type or memory
> > +	 * region. Each lru manager manages a lru list of this memory type.
> > +	 */
> > +	struct drm_lru_mgr lru_mgr[DRM_NUM_MEM_TYPES];
> > +
> >   	/* Everything below here is for legacy driver, never use! */
> >   	/* private: */
> >   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > diff --git a/include/drm/drm_evictable_lru.h b/include/drm/drm_evictable_lru.h
> > new file mode 100644
> > index 000000000000..3fd6bd2475d9
> > --- /dev/null
> > +++ b/include/drm/drm_evictable_lru.h
> > @@ -0,0 +1,188 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2023 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_EVICTABLE_LRU_H_
> > +#define _DRM_EVICTABLE_LRU_H_
> > +
> > +#include <linux/list.h>
> > +#include <linux/spinlock_types.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct drm_device;
> > +
> > +#define DRM_MAX_LRU_PRIORITY 4
> > +#define DRM_NUM_MEM_TYPES 8
> > +
> > +/**
> > + * struct drm_lru_entity
> > + *
> > + * @drm: drm device that this entity belongs to
> > + * @mem_type: The memory type that this entity belongs to
> > + * @size: resource size of this entity
> > + * @priority: The priority of this entity
> > + * @lru: least recent used list node, see &drm_lru_mgr.lru
> > + *
> > + * This structure represents an entity in drm_lru_mgr's
> > + * list. This structure is supposed to be embedded in
> > + * user's data structure.
> > + */
> > +struct drm_lru_entity {
> > +	struct drm_device *drm;
> > +	uint32_t mem_type;
> > +	uint64_t size;
> > +	uint32_t priority;
> > +	struct list_head lru;
> > +};
> > +
> > +/**
> > + * struct drm_lru_mgr
> > + *
> > + * @used: whether this lru manager is used or not
> > + * @size: size of the resource
> > + * @usage: how much resource has been used
> > + * @lru_lock: a weak reference to the global lru_lock
> > + * @lru: least recent used list, per priority
> > + *
> > + * This structure maintains all the buffer allocations
> > + * in a least recent used list, so a victim for eviction
> > + * can be easily found.
> > + */
> > +struct drm_lru_mgr {
> > +	bool used;
> > +	uint64_t size;
> > +	uint64_t usage;
> > +	spinlock_t *lru_lock;
> > +	struct list_head lru[DRM_MAX_LRU_PRIORITY];
> > +};
> > +
> > +/**
> > + * struct drm_lru_cursor
> > + *
> > + * @priority: the current priority
> > + *
> > + * Cursor to iterate over all entities in lru manager.
> > + */
> > +struct drm_lru_cursor {
> > +	unsigned priority;
> > +};
> > +
> > +/**
> > + * struct drm_lru_bulk_move_range
> > + *
> > + * @first: the first entity in the range
> > + * @last: the last entity in the range
> > + *
> > + * Range of entities on a lru list.
> > + */
> > +struct drm_lru_bulk_move_range
> > +{
> > +	struct drm_lru_entity *first;
> > +	struct drm_lru_entity *last;
> > +};
> > +
> > +/**
> > + * struct drm_lru_bulk_move
> > + *
> > + * @range: An array of bulk move range, each corelates to the drm_lru_mgr's
> > + * lru list of the same memory type and same priority.
> > + *
> > + * A collection of bulk_move range which can be used to move drm_lru_entity
> > + * on the lru list in a bulk way. It should be initialized through
> > + * drm_lru_bulk_move_init. Add/delete a drm_lru_entity to bulk move should call
> > + * drm_lru_add_bulk_move/drm_lru_del_bulk_move.
> > + */
> > +struct drm_lru_bulk_move {
> > +	struct drm_lru_bulk_move_range
> range[DRM_NUM_MEM_TYPES][DRM_MAX_LRU_PRIORITY];
> > +};
> > +
> > +
> > +
> > +/**
> > + * drm_lru_add_entity
> > + *
> > + * @entity: the lru entity to add
> > + * @mgr: the drm lru manager
> > + * @priority: specify which priority list to add
> > + *
> > + * Add an entity to lru list
> > + */
> > +static inline void drm_lru_add_entity(struct drm_lru_entity *entity,
> > +		struct drm_lru_mgr *mgr, unsigned priority)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	list_add_tail(&entity->lru, &mgr->lru[priority]);
> > +	mgr->usage += entity->size;
> > +}
> > +
> > +/**
> > + * drm_lru_remove_entity
> > + *
> > + * @entity: the lru entity to remove
> > + * @mgr: the drm lru manager
> > + *
> > + * Remove an entity from lru list
> > + */
> > +static inline void drm_lru_remove_entity(struct drm_lru_entity *entity,
> > +		struct drm_lru_mgr *mgr)
> > +{
> > +	lockdep_assert_held(mgr->lru_lock);
> > +	list_del_init(&entity->lru);
> > +	mgr->usage -= entity->size;
> > +}
> > +
> > +/**
> > + * drm_lru_mgr_fini
> > + *
> > + * @mgr: the drm lru manager
> > + *
> > + * de-initialize a lru manager
> > + */
> > +static inline void drm_lru_mgr_fini(struct drm_lru_mgr *mgr)
> > +{
> > +	mgr->used = false;
> > +}
> > +
> > +void drm_lru_entity_init(struct drm_lru_entity *entity, struct drm_device *drm,
> > +			uint32_t mem_type, uint64_t size, uint32_t priority);
> > +
> > +struct drm_lru_entity *
> > +drm_lru_first(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor);
> > +
> > +struct drm_lru_entity *
> > +drm_lru_next(struct drm_lru_mgr *mgr, struct drm_lru_cursor *cursor,
> > +		struct drm_lru_entity *entity);
> > +
> > +void drm_lru_mgr_init(struct drm_lru_mgr *mgr, uint64_t size,
> > +		spinlock_t *lru_lock);
> > +
> > +void drm_lru_move_to_tail(struct drm_lru_entity * entity);
> > +
> > +void drm_lru_bulk_move_init(struct drm_lru_bulk_move *bulk_move);
> > +
> > +
> > +void drm_lru_bulk_move_tail(struct drm_lru_bulk_move *bulk);
> > +
> > +void drm_lru_bulk_move_range_tail(struct drm_lru_bulk_move_range *range,
> > +		struct drm_lru_entity *entity);
> > +
> > +void drm_lru_add_bulk_move(struct drm_lru_entity *entity,
> > +		struct drm_lru_bulk_move *bulk_move);
> > +
> > +void drm_lru_del_bulk_move(struct drm_lru_entity *entity,
> > +		struct drm_lru_bulk_move *bulk_move);
> > +/**
> > + * drm_lru_for_each_entity
> > + *
> > + * @mgr: the drm lru manager
> > + * @cursor: cursor for the current position
> > + * @entity: the current drm_lru_entity
> > + *
> > + * Iterate over all entities in drm lru manager
> > + */
> > +#define drm_lru_for_each_entity(mgr, cursor, entity)		\
> > +	for (entity = drm_lru_first(mgr, cursor); entity;	\
> > +	     entity = drm_lru_next(mgr, cursor, entity))
> > +
> > +#endif



More information about the dri-devel mailing list