[PATCH] drm/amdkfd: Reset GPU on queue preemption failure
Joshi, Mukul
Mukul.Joshi at amd.com
Tue Mar 26 21:14:44 UTC 2024
[AMD Official Use Only - General]
Reviewed-by: Mukul Joshi <mukul.joshi at amd.com>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Harish
> Kasiviswanathan
> Sent: Tuesday, March 26, 2024 4:02 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan at amd.com>
> Subject: [PATCH] drm/amdkfd: Reset GPU on queue preemption failure
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Currently, with F32 HWS GPU reset is only when unmap queue fails.
>
> However, if compute queue doesn't repond to preemption request in time
> unmap will return without any error. In this case, only preemption error is
> logged and Reset is not triggered. Call GPU reset in this case also.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 151fabf84040..c08b6ee25289 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2000,6 +2000,7 @@ static int unmap_queues_cpsch(struct
> device_queue_manager *dqm,
> if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm-
> >packet_mgr.priv_queue->queue->mqd)) {
> while (halt_if_hws_hang)
> schedule();
> + kfd_hws_hang(dqm);
> return -ETIME;
> }
>
> --
> 2.34.1
More information about the amd-gfx
mailing list