[PATCH v5 05/12] drm/ttm: Provide a generic LRU walker helper

Wed Jun 19 07:31:33 UTC 2024

Hi, Matthew. 

Thanks for reviewing.

On Tue, 2024-06-18 at 22:11 +0000, Matthew Brost wrote:
> On Tue, Jun 18, 2024 at 09:18:13AM +0200, Thomas Hellström wrote:
> 
> Replying to correct version...
> 
> > Provide a generic LRU walker in TTM, in the spirit of
> > drm_gem_lru_scan()
> > but building on the restartable TTM LRU functionality.
> > 
> > The LRU walker optionally supports locking objects as part of
> > a ww mutex locking transaction, to mimic to some extent the
> > current functionality in ttm. However any -EDEADLK return
> > is converted to -ENOMEM, so that the driver will need to back
> > off and possibly retry without being able to keep the
> > ticket.
> > 
> 
> Wouldn't the backoff be unlock everything but keep the ticket?

We can't do that (yet) since we don't have the drm_exec or similar
functionality. The missing part is that if keep the ticket, it's in
contended state which means we need to slow-lock the contending lock as
the first lock. And the caller doesn't know which lock is the
contending one....

That is all addressed in the RFC part of the series that I left out for
now. This is only trying to mimic current functionality.

> 
> > v3:
> > - Move the helper to core ttm.
> > - Remove the drm_exec usage from it for now, it will be
> >   reintroduced later in the series.
> > v4:
> > - Handle the -EALREADY case if ticketlocking.
> > 
> > Cc: Christian König <christian.koenig at amd.com>
> > Cc: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: <dri-devel at lists.freedesktop.org>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > ---
> >  drivers/gpu/drm/ttm/ttm_bo_util.c | 145
> > ++++++++++++++++++++++++++++++
> >  include/drm/ttm/ttm_bo.h          |  32 +++++++
> >  2 files changed, 177 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > index 0b3f4267130c..45fcaf6f8644 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > @@ -768,3 +768,148 @@ int ttm_bo_pipeline_gutting(struct
> > ttm_buffer_object *bo)
> >  	ttm_tt_destroy(bo->bdev, ttm);
> >  	return ret;
> >  }
> > +
> > +static bool ttm_lru_walk_trylock(struct ttm_lru_walk *walk,
> > +				 struct ttm_buffer_object *bo,
> > +				 bool *needs_unlock)
> > +{
> > +	struct ttm_operation_ctx *ctx = walk->ctx;
> > +
> > +	*needs_unlock = false;
> > +
> > +	if (dma_resv_trylock(bo->base.resv)) {
> > +		*needs_unlock = true;
> > +		return true;
> > +	}
> > +
> > +	if (bo->base.resv == ctx->resv && ctx->allow_res_evict) {
> > +		dma_resv_assert_held(bo->base.resv);
> > +		return true;
> > +	}
> > +i
> 
> Any reason this is done after the try lock? Just kinda goofy as if
> this
> statement is true the dma_resv_trylock will always fail.

It should work either way. I guess I had viewed it as "trylock first,
if that fails, attempt any exception". I guess if we want to optimize
performance for shared lock implementations, moving it first might
avoid the atomic trylock operation, but I wouldn't expect a noticeable
difference.

> 
> > +	return false;
> > +}
> > +
> > +static int ttm_lru_walk_ticketlock(struct ttm_lru_walk *walk,
> > +				   struct ttm_buffer_object *bo,
> > +				   bool *needs_unlock)
> > +{
> > +	struct dma_resv *resv = bo->base.resv;
> > +	int ret;
> > +
> 
> I suppose we don't have asserts here like in Xe but if we did,
> assert(walk->ticket)?

I agree. I think we'd really want a TTM assert or warning that could be
compiled away. In any case, I only expect a single caller of this
function.

> 
> > +	if (walk->ctx->interruptible)
> > +		ret = dma_resv_lock_interruptible(resv, walk-
> > >ticket);
> > +	else
> > +		ret = dma_resv_lock(resv, walk->ticket);
> > +
> > +	if (!ret) {
> > +		*needs_unlock = true;
> > +		/* Only a single ticketlock per loop. */
> > +		walk->ticket = NULL;
> 
> Can you explain this a bit more? I see that once the walk->ticket is
> set
> to NULL this function will not be called again (i.e. only try locking
> will be used). I want to understand the reasoning for this.
> 
> It might be helpful for a more lengthly explaination in the comments
> of
> the code too.

I can add a more thorough explanation, Again, this is trying to mimic
the current code, that does a walk of trylocking, then a single ticket
lock more as a sort of "wait a while to be able to make progress" and
then resorts to trylock again.

The real reason to avoid multiple ticketlocks here is that each have a
chance of failing with -EDEADLK, (in particular with the -EDEADLK
injection enabled) which would translate to an -ENOMEM in the caller.

> 
> > +	} else if (ret == -EDEADLK) {
> > +		/* Caller needs to exit the ww transaction. */
> > +		ret = -ENOSPC;
> 
> The commit message says -ENOMEM.

The -ENOSPC is converted to -ENOMEM for the driver in
ttm_bo_validate(). I could explain a bit more in the commit message.

> 
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool
> > locked)
> > +{
> > +	if (locked)
> > +		dma_resv_unlock(bo->base.resv);
> > +}
> > +
> > +/**
> > + * ttm_lru_walk_for_evict() - Perform a LRU list walk, with
> > actions taken on
> > + * valid items.
> > + * @walk: describe the walks and actions taken
> > + * @bdev: The TTM device.
> > + * @man: The struct ttm_resource manager whose LRU lists we're
> > walking.
> > + * @target: The end condition for the walk.
> > + *
> > + * The LRU lists of @man are walk, and for each struct
> > ttm_resource encountered,
> > + * the corresponding ttm_buffer_object is locked and taken a
> > reference on, and
> > + * the LRU lock is dropped. the LRU lock may be dropped before
> > locking and, in
> > + * that case, it's verified that the item actually remains on the
> > LRU list after
> > + * the lock, and that the buffer object didn't switch resource in
> > between.
> > + *
> > + * With a locked object, the actions indicated by @walk-
> > >process_bo are
> > + * performed, and after that, the bo is unlocked, the refcount
> > dropped and the
> > + * next struct ttm_resource is processed. Here, the walker relies
> > on
> > + * TTM's restartable LRU list implementation.
> > + *
> > + * Typically @walk->process_bo() would return the number of pages
> > evicted,
> > + * swapped or shrunken, so that when the total exceeds @target, or
> > when the
> > + * LRU list has been walked in full, iteration is terminated. It's
> > also terminated
> > + * on error. Note that the definition of @target is done by the
> > caller, it
> > + * could have a different meaning than the number of pages.
> > + *
> > + * Note that the way dma_resv individualization is done, locking
> > needs to be done
> > + * either with the LRU lock held (trylocking only) or with a
> > reference on the
> > + * object.
> > + *
> > + * Return: The progress made towards target or negative error code
> > on error.
> > + */
> > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > ttm_device *bdev,
> > +			    struct ttm_resource_manager *man, long
> > target)
> > +{
> > +	struct ttm_resource_cursor cursor;
> > +	struct ttm_resource *res;
> > +	long sofar = 0;
> 
> s/sofar/evicted?

That's not always the case. When used for eviction it is actually 0 if
the new allocation failed, 1 on success. It should be interpreted as
"progress towards target", so perhaps progress?

> 
> > +	long lret;
> > +
> > +	spin_lock(&bdev->lru_lock);
> > +	ttm_resource_manager_for_each_res(man, &cursor, res) {
> > +		struct ttm_buffer_object *bo = res->bo;
> > +		bool bo_needs_unlock = false;
> > +		bool bo_locked = false;
> > +		int mem_type;
> > +
> > +		if (!bo || bo->resource != res)
> > +			continue;
> > +
> > +		if (ttm_lru_walk_trylock(walk, bo,
> > &bo_needs_unlock))
> > +			bo_locked = true;
> > +		else if ((!walk->ticket) || walk->ctx->no_wait_gpu
> > ||
> 
> Nit - (!walk->ticket) could just be !walk->ticket.

Will fix.

> 
> > +			 walk->trylock_only)
> > +			continue;
> > +
> > +		if (!ttm_bo_get_unless_zero(bo)) {
> > +			ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > +			continue;
> > +		}
> > +
> 
> This kinda goofy pattern too, typically in code a get_unless_zero is
> done before trying to lock the object not after. Even odder here, the
> could or could not be locked depending on the outcome of
> ttm_lru_walk_trylock. This is covering individualization case? Would
> it
> make more sense to move ttm_bo_get_unless_zero before the try lock or
> is
> that to avoid a put on try lock failure + continue?

I guess this is still a remnant from the old code: ttm_bos can't be put
in atomic context, so we'd had to unlock the lru_lock to put ( which we
still do). However, without the restartable lists that we have with
this series, list traversal would have to be reset.

If we were to change the order of trylock and get_unless_zero now, we
could do that, but that would mean unnecessary refcounting and dropping
the lru_lock in the case of trylock failure.

> 
> > +		mem_type = res->mem_type;
> > +		spin_unlock(&bdev->lru_lock);
> > +
> > +		lret = 0;
> > +		if (!bo_locked && walk->ticket)
> 
> As above could you explain the ticket usage a bit more?
> 
> Also you shouldn't need to check the ticket here there is !walk-
> >ticket
> above which triggers a continue.

I hope I explained sufficiently above. Please get back otherwise, I'll
remove the walk->ticket check.

> 
> > +			lret = ttm_lru_walk_ticketlock(walk, bo,
> > &bo_needs_unlock);
> > +
> > +		/*
> > +		 * Note that in between the release of the lru
> > lock and the
> > +		 * ticketlock, the bo may have switched resource,
> > +		 * and also memory type, since the resource may
> > have been
> > +		 * freed and allocated again with a different
> > memory type.
> > +		 * In that case, just skip it.
> > +		 */
> > +		if (!lret && bo->resource == res && res->mem_type
> > == mem_type)
> > +			lret = walk->ops->process_bo(walk, bo);
> > +
> > +		ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > +		ttm_bo_put(bo);
> > +		if (lret == -EBUSY || lret == -EALREADY)
> > +			lret = 0;
> 
> What is usage of these error codes?
> 
> -EALREADY means the resv is locked with the current ticket, right?
> Wouldn't we want to call process_bo in this case too?

No, then we might evict our own working set. Processing of shared bos
is handled as before with ctx->allow_res_evict.

> 
> -EBUSY I need some help figuring out.

-EBUSY is the return of ttm_bo_wait_ctx() in case we're in no-wait-gpu
context. Like direct reclaim. If the bo is then busy, we simply skip to
the next bo. If process_bo() waits for gpu idle using the above
function, this is to catch that error code.
Should probably add that to the process_bo() docs.

> 
> > +		sofar = (lret < 0) ? lret : sofar + lret;
> > +		if (sofar < 0 || sofar >= target)
> > +			goto out;
> > +
> 
> Here we have dropped the BO unlock. What prevents the BO from being
> moved back to the resource we just evicted it from resulting in sofar
> not being accurate?

For eviction, the byte-count is not used, but rather the success of
allocating the new resource.

For shrinking, any process could allocate what we just shrunk, and even
so shrinking does not always mean that the memory is immediately
available so the byte-count will always be a rough estimate.

A related question is "what prevents anyone else from stealing memory
that we just evicted", and the answer to that is "Nothing... yet". The
exhaustive eviction part handles that with drm_exec keeping the bo
locks of evicted objects until we make progress. Eventually we'd hold
enough locks to block all other competitors.

Unrelated to this patch, me and Christian had a discussion on whether
to unlock those when we made progress in the sense of

a) A validation succeeded,
b) An exec submission succeeded.

/Thomas

> 
> Matt
> 
> > +		cond_resched();
> > +		spin_lock(&bdev->lru_lock);
> > +	}
> > +	spin_unlock(&bdev->lru_lock);
> > +out:
> > +	ttm_resource_cursor_fini(&cursor);
> > +	return sofar;
> > +}
> > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > index 6ccf96c91f3a..8b032298d66e 100644
> > --- a/include/drm/ttm/ttm_bo.h
> > +++ b/include/drm/ttm/ttm_bo.h
> > @@ -190,6 +190,38 @@ struct ttm_operation_ctx {
> >  	uint64_t bytes_moved;
> >  };
> >  
> > +struct ttm_lru_walk;
> > +
> > +/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
> > +struct ttm_lru_walk_ops {
> > +	/**
> > +	 * process_bo - Process this bo.
> > +	 * @walk: struct ttm_lru_walk describing the walk.
> > +	 * @bo: A locked and referenced buffer object.
> > +	 *
> > +	 * Return: Negative error code on error, Number of
> > processed pages on
> > +	 * success. 0 also indicates success.
> > +	 */
> > +	long (*process_bo)(struct ttm_lru_walk *walk, struct
> > ttm_buffer_object *bo);
> > +};
> > +
> > +/**
> > + * struct ttm_lru_walk - Structure describing a LRU walk.
> > + */
> > +struct ttm_lru_walk {
> > +	/** @ops: Pointer to the ops structure. */
> > +	const struct ttm_lru_walk_ops *ops;
> > +	/** @ctx: Pointer to the struct ttm_operation_ctx. */
> > +	struct ttm_operation_ctx *ctx;
> > +	/** @ticket: The struct ww_acquire_ctx if any. */
> > +	struct ww_acquire_ctx *ticket;
> > +	/** @tryock_only: Only use trylock for locking. */
> > +	bool trylock_only;
> > +};
> > +
> > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > ttm_device *bdev,
> > +			    struct ttm_resource_manager *man, long
> > target);
> > +
> >  /**
> >   * ttm_bo_get - reference a struct ttm_buffer_object
> >   *
> > -- 
> > 2.44.0
> >