[PATCH] drm/amdgpu: Lock reset domain when VF get host FLR work message

Wed Jun 4 11:39:47 UTC 2025

[Public]

> @@ -6098,7 +6097,8 @@  static int amdgpu_device_halt_activities(struct amdgpu_device *adev,
>       /* We need to lock reset domain only once both for XGMI and single device */
>       tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
>                                   reset_list);
> -     amdgpu_device_lock_reset_domain(tmp_adev->reset_domain);
> +     if (!test_bit(AMDGPU_HOST_FLR, &reset_context->flags))
> +             amdgpu_device_lock_reset_domain(tmp_adev->reset_domain);
>
>       /* block all schedulers and reset given job's ring */
>       list_for_each_entry(tmp_adev, device_list_handle, reset_list) {

The host should be waiting for amdgpu_virt_ready_to_reset before it reset, which happens after amdgpu_device_halt_activities, so I think the lock here is fine. Is the host side wait timing out for you? If so the root cause should be that we take too long to halt guest activity.

Teddy