[PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler

Lancelot SIX Lancelot.Six at amd.com
Wed May 29 21:07:36 UTC 2024



On 29/05/2024 20:35, Jay Cornwall wrote:
> A wavefront may deallocate its VGPRs at the end of a program while
> waiting for memory transactions to complete. If it subsequently
> receives a context save exception it will be unable to save,
> since this requires VGPRs. In this case the trap handler should
> terminate the wavefront.
> 
> Fixes intermittent VM faults under context switching load.
> 
> V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters

Hi Jay,

Thanks for the V2.

FYI,as far as I can see, the .h part of the patch does not seem to apply 
directly on current amd-staging-drm-next, but I guess we just have a 
different bases.

> 
> Signed-off-by: Jay Cornwall <jay.cornwall at amd.com>
> Cc: Lancelot Six <lancelot.six at amd.com>
> ---
>   .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h    | 695 +++++++++---------
>   .../amd/amdkfd/cwsr_trap_handler_gfx10.asm    |  17 +
>   2 files changed, 366 insertions(+), 346 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> index 18e012e04493..ac3702b8e3c4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> @@ -97,6 +97,7 @@ var S_STATUS_HALT_MASK				= SQ_WAVE_STATE_PRIV_HALT_MASK
>   var S_SAVE_PC_HI_TRAP_ID_MASK			= 0xF0000000
>   #endif
>   
> +var SQ_WAVE_STATUS_NO_VGPRS_SHIFT		= 24
>   var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT		= 12
>   var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE		= 9
>   var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE		= 8
> @@ -451,6 +452,22 @@ L_EXIT_TRAP:
>   	s_rfe_b64	[ttmp0, ttmp1]
>   
>   L_SAVE:
> +	// If VGPRs have been deallocated then terminate the wavefront.
> +	// It has no remaining program to run and cannot save without VGPRs.
> +#if ASIC_FAMILY == CHIP_PLUM_BONITO
> +	s_bitcmp1_b32	s_save_status, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
> +	s_cbranch_scc0	L_HAVE_VGPRS
> +	s_endpgm
> +L_HAVE_VGPRS:
> +#endif
> +#if ASIC_FAMILY >= CHIP_GFX12

This is mostly cosmetic, but you could use

     #elif ASIC_FAMILY >= CHIP_GFX12

instead on #endif + #if.  Those 2 blocks are not independent, they 
achieve the same goal for different configurations.

I do not mind if you prefer to keep this as it is.  Either way, the 
change look good to me:

Reviewed-by: Lancelot Six <lancelot.six at amd.com>

Best,
Lancelot.

> +	s_getreg_b32	s_save_tmp, hwreg(HW_REG_WAVE_STATUS)
> +	s_bitcmp1_b32	s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
> +	s_cbranch_scc0	L_HAVE_VGPRS
> +	s_endpgm
> +L_HAVE_VGPRS:
> +#endif
> +
>   	s_and_b32	s_save_pc_hi, s_save_pc_hi, 0x0000ffff			//pc[47:32]
>   	s_mov_b32	s_save_tmp, 0
>   	s_setreg_b32	hwreg(S_TRAPSTS_HWREG, S_TRAPSTS_SAVE_CONTEXT_SHIFT, 1), s_save_tmp	//clear saveCtx bit


More information about the amd-gfx mailing list