[PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

Christian König ckoenig.leichtzumerken at gmail.com
Mon Oct 23 15:15:43 UTC 2023


Am 23.10.23 um 15:06 schrieb Daniel Tang:
> That commit causes the screen to freeze a few moments after running
> clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer
> including ssh also freezes. On v6.5-rc1, it only results in a NULL pointer
> deference message in dmesg and the process to become a zombie whose
> unkillableness prevents shutdown without REISUB. Although llama.cpp and
> hashcat were working in v6.2 and ROCm 5.6, broke, and are not fixed by
> this revert, pytorch-rocm is now working with stability and without
> whole-computer freezes caused by any accidental running of clinfo.
>
> This reverts commit 1d7776cc148b9f2f3ebaf1181662ba695a29f639.

That result doesn't make much sense. Felix please correct me, but AFAIK 
the ATS stuff was completely removed by now.

Are you sure that this is pure v6.6-rc7 and not some other patches 
applied? If yes than we must have missed something.

Regards,
Christian.

>
> Closes: https://github.com/RadeonOpenCompute/ROCm/issues/2596
> Signed-off-by: Daniel Tang <danielzgtg.opensource at gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 82f25996ff5e..602f311ab766 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2243,16 +2243,16 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	if (r)
>   		return r;
>   
> +	/* Sanity checks */
> +	if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
> +		r = -EINVAL;
> +		goto unreserve_bo;
> +	}
> +
>   	/* Check if PD needs to be reinitialized and do it before
>   	 * changing any other state, in case it fails.
>   	 */
>   	if (pte_support_ats != vm->pte_support_ats) {
> -		/* Sanity checks */
> -		if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
> -			r = -EINVAL;
> -			goto unreserve_bo;
> -		}
> -
>   		vm->pte_support_ats = pte_support_ats;
>   		r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
>   				       false);
> --
> 2.40.1
>
>
>



More information about the amd-gfx mailing list