[PATCH] drm/amd/amdkfd: Evict all queues even HWS remove queue failed
Zha, YiFan(Even)
Yifan.Zha at amd.com
Fri Mar 7 09:10:47 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Felix,
Thanks. Patch v2 is submitted. It should make sure error returned even if remove_queue_mes is success.
Could you pleas help to review it again?
Thanks.
Best regard,
Yifan Zha
________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Thursday, March 6, 2025 8:23 AM
To: Zha, YiFan(Even) <Yifan.Zha at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Cc: Chang, HaiJun <HaiJun.Chang at amd.com>; Chen, Horace <Horace.Chen at amd.com>; Yin, ZhenGuo (Chris) <ZhenGuo.Yin at amd.com>
Subject: Re: [PATCH] drm/amd/amdkfd: Evict all queues even HWS remove queue failed
On 2025-03-05 00:42, Yifan Zha wrote:
> [Why]
> If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
> Then remaining queues are not evicted and in active state.
>
> After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
> So remove queue will be failed again.
>
> [How]
> Keep removing all queues even if HWS returns failed.
> It will not affect cpsch as it checks reset_domain->sem.
>
> Signed-off-by: Yifan Zha <Yifan.Zha at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index f3f2fd6ee65c..b213a845bd5b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1223,7 +1223,6 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
> if (retval) {
> dev_err(dev, "Failed to evict queue %d\n",
> q->properties.queue_id);
> - goto out;
Is every subsequent call to remove_queue_mes guaranteed to also return
an error? If not, you need a way to make sure an error is returned if
any queue failed to be removed even if the last queue succeeded.
Regards,
Felix
> }
> }
> }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250307/2c76d4cf/attachment.htm>
More information about the amd-gfx
mailing list