[PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted
Felix Kuehling
felix.kuehling at amd.com
Tue Apr 2 22:32:56 UTC 2024
On 2024-04-01 17:53, Zhigang Luo wrote:
> If there are more than one device doing reset in parallel, the first
> device will call kfd_suspend_all_processes() to evict all processes
> on all devices, this call takes time to finish. other device will
> start reset and recover without waiting. if the process has not been
> evicted before doing recover, it will be restored, then caused page
> fault.
>
> Signed-off-by: Zhigang Luo<Zhigang.Luo at amd.com>
> Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
Please remove the Change-Id: before you push. Other than that, this patch is
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 041ec3de55e7..55f89c858c7a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -969,11 +969,11 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
> if (!run_pm) {
> mutex_lock(&kfd_processes_mutex);
> count = ++kfd_locked;
> - mutex_unlock(&kfd_processes_mutex);
>
> /* For first KFD device suspend all the KFD processes */
> if (count == 1)
> kfd_suspend_all_processes();
This could be simplified now. The variable "count" was only needed for
the broken attempt to do call suspend outside the lock. Now you can just do:
mutex_lock(&kfd_processes_mutex);
if (++kfd_locked == 1)
kfd_suspend_all_processes();
mutex_unlock(&kfd_processes_mutex);
To be consistent, we probably need to make a similar change in
kgd2kfd_resume and run kfd_resume_all_processes under the lock as well.
Otherwise there could be a race condition between suspend and resume.
Regards,
Felix
> + mutex_unlock(&kfd_processes_mutex);
> }
>
> for (i = 0; i < kfd->num_nodes; i++) {
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240402/719a5d05/attachment-0001.htm>
More information about the amd-gfx
mailing list