[PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler
Lancelot SIX
Lancelot.Six at amd.com
Wed May 29 21:07:36 UTC 2024
On 29/05/2024 20:35, Jay Cornwall wrote:
> A wavefront may deallocate its VGPRs at the end of a program while
> waiting for memory transactions to complete. If it subsequently
> receives a context save exception it will be unable to save,
> since this requires VGPRs. In this case the trap handler should
> terminate the wavefront.
>
> Fixes intermittent VM faults under context switching load.
>
> V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters
Hi Jay,
Thanks for the V2.
FYI,as far as I can see, the .h part of the patch does not seem to apply
directly on current amd-staging-drm-next, but I guess we just have a
different bases.
>
> Signed-off-by: Jay Cornwall <jay.cornwall at amd.com>
> Cc: Lancelot Six <lancelot.six at amd.com>
> ---
> .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 695 +++++++++---------
> .../amd/amdkfd/cwsr_trap_handler_gfx10.asm | 17 +
> 2 files changed, 366 insertions(+), 346 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> index 18e012e04493..ac3702b8e3c4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
> @@ -97,6 +97,7 @@ var S_STATUS_HALT_MASK = SQ_WAVE_STATE_PRIV_HALT_MASK
> var S_SAVE_PC_HI_TRAP_ID_MASK = 0xF0000000
> #endif
>
> +var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24
> var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12
> var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE = 9
> var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE = 8
> @@ -451,6 +452,22 @@ L_EXIT_TRAP:
> s_rfe_b64 [ttmp0, ttmp1]
>
> L_SAVE:
> + // If VGPRs have been deallocated then terminate the wavefront.
> + // It has no remaining program to run and cannot save without VGPRs.
> +#if ASIC_FAMILY == CHIP_PLUM_BONITO
> + s_bitcmp1_b32 s_save_status, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
> + s_cbranch_scc0 L_HAVE_VGPRS
> + s_endpgm
> +L_HAVE_VGPRS:
> +#endif
> +#if ASIC_FAMILY >= CHIP_GFX12
This is mostly cosmetic, but you could use
#elif ASIC_FAMILY >= CHIP_GFX12
instead on #endif + #if. Those 2 blocks are not independent, they
achieve the same goal for different configurations.
I do not mind if you prefer to keep this as it is. Either way, the
change look good to me:
Reviewed-by: Lancelot Six <lancelot.six at amd.com>
Best,
Lancelot.
> + s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS)
> + s_bitcmp1_b32 s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
> + s_cbranch_scc0 L_HAVE_VGPRS
> + s_endpgm
> +L_HAVE_VGPRS:
> +#endif
> +
> s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff //pc[47:32]
> s_mov_b32 s_save_tmp, 0
> s_setreg_b32 hwreg(S_TRAPSTS_HWREG, S_TRAPSTS_SAVE_CONTEXT_SHIFT, 1), s_save_tmp //clear saveCtx bit
More information about the amd-gfx
mailing list