[PATCH v6 10/12] drm/ttm: Use fault-injection to test error paths

Matthew Brost matthew.brost at intel.com
Wed Aug 7 23:43:28 UTC 2024


On Wed, Jul 03, 2024 at 05:38:11PM +0200, Thomas Hellström wrote:
> Use fault-injection to test partial TTM swapout and interrupted swapin.
> Return -EINTR for swapin to test the callers ability to handle and
> restart the swapin, and on swapout perform a partial swapout to test that
> the swapin and release_shrunken functionality.
> 
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: <dri-devel at lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> ---
>  drivers/gpu/drm/Kconfig        | 10 ++++++++++
>  drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++-
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index fd0749c0c630..9f27271bfab8 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -272,6 +272,16 @@ config DRM_GPUVM
>  	  GPU-VM representation providing helpers to manage a GPUs virtual
>  	  address space
>  
> +config DRM_TTM_BACKUP_FAULT_INJECT
> +	bool "Enable fault injection during TTM backup"
> +	depends on DRM_TTM
> +	default n
> +	help
> +	  Inject recoverable failures during TTM backup and recovery of
> +	  backed-up objects. For DRM driver developers only.
> +
> +	  If in doubt, choose N.
> +
>  config DRM_BUDDY
>  	tristate
>  	depends on DRM
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 38e50cf81b0a..d32a1f2e5e50 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -431,6 +431,7 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore,
>  			       struct ttm_backup *backup,
>  			       struct ttm_operation_ctx *ctx)
>  {
> +	static unsigned long __maybe_unused swappedin;
>  	unsigned int i, nr = 1 << restore->order;
>  	int ret = 0;
>  
> @@ -446,6 +447,13 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore,
>  			if (handle == 0)
>  				continue;
>  
> +			if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT) &&
> +			    ctx->interruptible &&
> +			    ++swappedin % 100 == 0) {
> +				ret = -EINTR;
> +				break;
> +			}

So here this -EINTR would be kicked to the user IOCTL which triggered
the BO validate and retry? The restore then should be able to
successfully pick up where it left off?

> +
>  			ret = backup->ops->copy_backed_up_page
>  				(backup, restore->first_page[i],
>  				 handle, ctx->interruptible);
> @@ -892,7 +900,14 @@ long ttm_pool_backup_tt(struct ttm_pool *pool, struct ttm_tt *ttm, bool purge,
>  
>  	alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN | __GFP_RETRY_MAYFAIL;
>  
> -	for (i = 0; i < ttm->num_pages; ++i) {
> +	num_pages = ttm->num_pages;
> +
> +	/* Pretend doing fault injection by shrinking only half of the pages. */
> +
> +	if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT))
> +		num_pages = DIV_ROUND_UP(num_pages, 2);

So what happens here? Half the pages swapped out, then upon restore half
swapped back in? The shrinker continues to walk until enough pages
swapped out?

Matt

> +
> +	for (i = 0; i < num_pages; ++i) {
>  		page = ttm->pages[i];
>  		if (unlikely(!page))
>  			continue;
> -- 
> 2.44.0
> 


More information about the dri-devel mailing list