[PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Thomas Hellström thomas.hellstrom at linux.intel.com
Mon Aug 19 11:38:56 UTC 2024


Hi, Christian,

On Mon, 2024-08-19 at 13:03 +0200, Christian König wrote:
> Am 06.08.24 um 10:29 schrieb Thomas Hellström:
> > Hi, Christian.
> > 
> > On Thu, 2024-07-11 at 14:01 +0200, Christian König wrote:
> > > Am 10.07.24 um 20:19 schrieb Matthew Brost:
> > > > On Wed, Jul 10, 2024 at 02:42:58PM +0200, Christian König
> > > > wrote:
> > > > > That is something drivers really shouldn't mess with.
> > > > > 
> > > > Thomas uses this in Xe to implement a shrinker [1]. Seems to
> > > > need
> > > > to
> > > > remain available for drivers.
> > > No, that is exactly what I try to prevent with that.
> > > 
> > > This is an internally thing of TTM and drivers should never use
> > > it
> > > directly.
> > That driver-facing LRU walker is a direct response/solution to this
> > comment that you made in the first shrinker series:
> > 
> > https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9
> 
> Ah, yeah that was about how we are should be avoiding middle layer
> design.
> 
> But a function which returns the next candidate for eviction and a 
> function which calls a callback for eviction is exactly the opposite.
> 
> > That is also mentioned in the cover letter of the recent shrinker
> > series, and this walker has been around in that shrinker series for
> > more than half a year now so if you think it's not the correct
> > driver
> > facing API IMHO that should be addressed by a review comment in
> > that
> > series rather than in posting a conflicting patch?
> 
> I actually outlined that in the review comments for the patch series.
> E.g. a walker function with a callback is basically a middle layer.
> 
> What outlined in the link above is that a function which returns the 
> next eviction candidate is a better approach than a callback.
> 
> > So assuming that we still want the driver to register the shrinker,
> > IMO that helper abstracts away all the nasty locking and pitfalls
> > for a
> > driver-registered shrinker, and is similar in structure to for
> > example
> > the pagewalk helper in mm/pagewalk.c.
> > 
> > An alternative that could be tried as a driver-facing API is to
> > provide
> > a for_each_bo_in_lru_lock() macro where the driver open-codes
> > "process_bo()" inside the for loop but I tried this and found it
> > quite
> > fragile since the driver might exit the loop without performing the
> > necessary cleanup.
> 
> The point is that the shrinker should *never* need to have context.
> E.g. 
> a walker which allows going over multiple BOs for eviction is exactly
> the wrong approach for that.
> 
> The shrinker should evict always only exactly one BO and the next 
> invocation of a shrinker should not depend on the result of the
> previous 
> one.
> 
> Or am I missing something vital here?

A couple of things,

1) I'd like to think of the middle-layer vs helper like the helper has
its own ops, and can be used optionally from the driver. IIRC, the
atomic modesetting / pageflip ops are implemented in exactly this way.

Sometimes a certain loop operation can't be easily or at least robustly
implemented using a for_each_.. approach. Like for example
mm/pagewalk.c. In this shrinking case I think it's probably possible
using the scoped_guard() in cleanup.h. This way we could get rid of
this middle layer discussion by turning the interface inside-out:

for_each_bo_on_lru_locked(xxx)
	driver_shrink();

But I do think the currently suggested approach is less fragile and
prone to driver error.

FWIW in addition to the above examples, also drm_gem_lru_scan works
like this.

2) The shrinkers suggest to shrink a number of pages, not a single bo,
again drm_gem_lru_scan works like this. i915 works like this. I think
we should align with those.

3) Even if we had a function to obtain the driver to shrink, the driver
needs to have its say about the suitability for shrinking, so a
callback is needed anyway (eviction_valuable).
In addition, if shrinking fails for some reason, what would stop that
function to return the same bo, again and again just like the problem
we previously had in TTM?

So basically all the restartable LRU work was motivated by this use-
case in mind, so making that private I must say comes as a pretty major
surprise.

I could have a look at the 

for_each_bo_on_lru_locked(xxx)
	driver_shrink();

approach, but having TTM just blindly return a single bo without
context will not work IMO.

Thanks,
/Thomas






> 
> Regards,
> Christian.
> 
> > 
> > However with using scoped_guard() in linux/cleanup.h that could
> > probably be mitigated to some exent, but I still think that a
> > walker
> > helper like this is the safer choice and given the complexity of a
> > for_
> > macro involving scoped_guard(), I think the walker helper is the
> > easiest-to-maintain solution moving forward.
> > 
> > But open to suggestions.
> > 
> > Thanks
> > Thomas
> > 
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Matt
> > > > 
> > > > [1]
> > > > https://patchwork.freedesktop.org/patch/602165/?series=131815&rev=6
> > > > 
> > > > > Signed-off-by: Christian König <christian.koenig at amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/ttm/ttm_bo.c      |  1 +
> > > > >    drivers/gpu/drm/ttm/ttm_bo_util.c |  2 +
> > > > >    drivers/gpu/drm/ttm/ttm_bo_util.h | 67
> > > > > +++++++++++++++++++++++++++++++
> > > > >    include/drm/ttm/ttm_bo.h          | 35 ----------------
> > > > >    4 files changed, 70 insertions(+), 35 deletions(-)
> > > > >    create mode 100644 drivers/gpu/drm/ttm/ttm_bo_util.h
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > index 0131ec802066..41bee8696e69 100644
> > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > @@ -45,6 +45,7 @@
> > > > >    #include <linux/dma-resv.h>
> > > > >    
> > > > >    #include "ttm_module.h"
> > > > > +#include "ttm_bo_util.h"
> > > > >    
> > > > >    static void ttm_bo_mem_space_debug(struct
> > > > > ttm_buffer_object
> > > > > *bo,
> > > > >    					struct ttm_placement
> > > > > *placement)
> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > index 3c07f4712d5c..03e28e3d0d03 100644
> > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > @@ -37,6 +37,8 @@
> > > > >    
> > > > >    #include <drm/drm_cache.h>
> > > > >    
> > > > > +#include "ttm_bo_util.h"
> > > > > +
> > > > >    struct ttm_transfer_obj {
> > > > >    	struct ttm_buffer_object base;
> > > > >    	struct ttm_buffer_object *bo;
> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.h
> > > > > b/drivers/gpu/drm/ttm/ttm_bo_util.h
> > > > > new file mode 100644
> > > > > index 000000000000..c19b50809208
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.h
> > > > > @@ -0,0 +1,67 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> > > > > +/***********************************************************
> > > > > ****
> > > > > ***********
> > > > > + * Copyright 2024 Advanced Micro Devices, Inc.
> > > > > + *
> > > > > + * Permission is hereby granted, free of charge, to any
> > > > > person
> > > > > obtaining a
> > > > > + * copy of this software and associated documentation files
> > > > > (the
> > > > > "Software"),
> > > > > + * to deal in the Software without restriction, including
> > > > > without limitation
> > > > > + * the rights to use, copy, modify, merge, publish,
> > > > > distribute,
> > > > > sublicense,
> > > > > + * and/or sell copies of the Software, and to permit persons
> > > > > to
> > > > > whom the
> > > > > + * Software is furnished to do so, subject to the following
> > > > > conditions:
> > > > > + *
> > > > > + * The above copyright notice and this permission notice
> > > > > shall
> > > > > be included in
> > > > > + * all copies or substantial portions of the Software.
> > > > > + *
> > > > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > > > KIND, EXPRESS OR
> > > > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > > > MERCHANTABILITY,
> > > > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
> > > > > NO
> > > > > EVENT SHALL
> > > > > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY
> > > > > CLAIM,
> > > > > DAMAGES OR
> > > > > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
> > > > > OR
> > > > > OTHERWISE,
> > > > > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
> > > > > OR
> > > > > THE USE OR
> > > > > + * OTHER DEALINGS IN THE SOFTWARE.
> > > > > + *
> > > > > +
> > > > > *************************************************************
> > > > > ****
> > > > > *********/
> > > > > +#ifndef _TTM_BO_UTIL_H_
> > > > > +#define _TTM_BO_UTIL_H_
> > > > > +
> > > > > +struct ww_acquire_ctx;
> > > > > +
> > > > > +struct ttm_buffer_object;
> > > > > +struct ttm_operation_ctx;
> > > > > +struct ttm_lru_walk;
> > > > > +
> > > > > +/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
> > > > > +struct ttm_lru_walk_ops {
> > > > > +	/**
> > > > > +	 * process_bo - Process this bo.
> > > > > +	 * @walk: struct ttm_lru_walk describing the walk.
> > > > > +	 * @bo: A locked and referenced buffer object.
> > > > > +	 *
> > > > > +	 * Return: Negative error code on error, User-
> > > > > defined
> > > > > positive value
> > > > > +	 * (typically, but not always, size of the processed
> > > > > bo)
> > > > > on success.
> > > > > +	 * On success, the returned values are summed by the
> > > > > walk and the
> > > > > +	 * walk exits when its target is met.
> > > > > +	 * 0 also indicates success, -EBUSY means this bo
> > > > > was
> > > > > skipped.
> > > > > +	 */
> > > > > +	s64 (*process_bo)(struct ttm_lru_walk *walk,
> > > > > +			  struct ttm_buffer_object *bo);
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * struct ttm_lru_walk - Structure describing a LRU walk.
> > > > > + */
> > > > > +struct ttm_lru_walk {
> > > > > +	/** @ops: Pointer to the ops structure. */
> > > > > +	const struct ttm_lru_walk_ops *ops;
> > > > > +	/** @ctx: Pointer to the struct ttm_operation_ctx.
> > > > > */
> > > > > +	struct ttm_operation_ctx *ctx;
> > > > > +	/** @ticket: The struct ww_acquire_ctx if any. */
> > > > > +	struct ww_acquire_ctx *ticket;
> > > > > +	/** @tryock_only: Only use trylock for locking. */
> > > > > +	bool trylock_only;
> > > > > +};
> > > > > +
> > > > > +s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > > > ttm_device *bdev,
> > > > > +			   struct ttm_resource_manager *man,
> > > > > s64
> > > > > target);
> > > > > +
> > > > > +#endif
> > > > > diff --git a/include/drm/ttm/ttm_bo.h
> > > > > b/include/drm/ttm/ttm_bo.h
> > > > > index d1a732d56259..5f7c967222a2 100644
> > > > > --- a/include/drm/ttm/ttm_bo.h
> > > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > > @@ -194,41 +194,6 @@ struct ttm_operation_ctx {
> > > > >    	uint64_t bytes_moved;
> > > > >    };
> > > > >    
> > > > > -struct ttm_lru_walk;
> > > > > -
> > > > > -/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
> > > > > -struct ttm_lru_walk_ops {
> > > > > -	/**
> > > > > -	 * process_bo - Process this bo.
> > > > > -	 * @walk: struct ttm_lru_walk describing the walk.
> > > > > -	 * @bo: A locked and referenced buffer object.
> > > > > -	 *
> > > > > -	 * Return: Negative error code on error, User-
> > > > > defined
> > > > > positive value
> > > > > -	 * (typically, but not always, size of the processed
> > > > > bo)
> > > > > on success.
> > > > > -	 * On success, the returned values are summed by the
> > > > > walk and the
> > > > > -	 * walk exits when its target is met.
> > > > > -	 * 0 also indicates success, -EBUSY means this bo
> > > > > was
> > > > > skipped.
> > > > > -	 */
> > > > > -	s64 (*process_bo)(struct ttm_lru_walk *walk, struct
> > > > > ttm_buffer_object *bo);
> > > > > -};
> > > > > -
> > > > > -/**
> > > > > - * struct ttm_lru_walk - Structure describing a LRU walk.
> > > > > - */
> > > > > -struct ttm_lru_walk {
> > > > > -	/** @ops: Pointer to the ops structure. */
> > > > > -	const struct ttm_lru_walk_ops *ops;
> > > > > -	/** @ctx: Pointer to the struct ttm_operation_ctx.
> > > > > */
> > > > > -	struct ttm_operation_ctx *ctx;
> > > > > -	/** @ticket: The struct ww_acquire_ctx if any. */
> > > > > -	struct ww_acquire_ctx *ticket;
> > > > > -	/** @tryock_only: Only use trylock for locking. */
> > > > > -	bool trylock_only;
> > > > > -};
> > > > > -
> > > > > -s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > > > ttm_device *bdev,
> > > > > -			   struct ttm_resource_manager *man,
> > > > > s64
> > > > > target);
> > > > > -
> > > > >    /**
> > > > >     * ttm_bo_get - reference a struct ttm_buffer_object
> > > > >     *
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> 



More information about the dri-devel mailing list