[PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"
Daniel Tang
danielzgtg.opensource at gmail.com
Mon Oct 23 13:06:34 UTC 2023
That commit causes the screen to freeze a few moments after running
clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer
including ssh also freezes. On v6.5-rc1, it only results in a NULL pointer
deference message in dmesg and the process to become a zombie whose
unkillableness prevents shutdown without REISUB. Although llama.cpp and
hashcat were working in v6.2 and ROCm 5.6, broke, and are not fixed by
this revert, pytorch-rocm is now working with stability and without
whole-computer freezes caused by any accidental running of clinfo.
This reverts commit 1d7776cc148b9f2f3ebaf1181662ba695a29f639.
Closes: https://github.com/RadeonOpenCompute/ROCm/issues/2596
Signed-off-by: Daniel Tang <danielzgtg.opensource at gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 82f25996ff5e..602f311ab766 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2243,16 +2243,16 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
if (r)
return r;
+ /* Sanity checks */
+ if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
+ r = -EINVAL;
+ goto unreserve_bo;
+ }
+
/* Check if PD needs to be reinitialized and do it before
* changing any other state, in case it fails.
*/
if (pte_support_ats != vm->pte_support_ats) {
- /* Sanity checks */
- if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
- r = -EINVAL;
- goto unreserve_bo;
- }
-
vm->pte_support_ats = pte_support_ats;
r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
false);
--
2.40.1
More information about the amd-gfx
mailing list