[PATCH] drm/amdkfd: Reset GPU on queue preemption failure

Alex Deucher alexdeucher at gmail.com
Wed Mar 27 02:14:29 UTC 2024


On Tue, Mar 26, 2024 at 4:12 PM Harish Kasiviswanathan
<Harish.Kasiviswanathan at amd.com> wrote:
>
> Currently, with F32 HWS GPU reset is only when unmap queue fails.
>
> However, if compute queue doesn't repond to preemption request in time
> unmap will return without any error. In this case, only preemption error
> is logged and Reset is not triggered. Call GPU reset in this case also.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan at amd.com>

Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 151fabf84040..c08b6ee25289 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2000,6 +2000,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
>         if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm->packet_mgr.priv_queue->queue->mqd)) {
>                 while (halt_if_hws_hang)
>                         schedule();
> +               kfd_hws_hang(dqm);
>                 return -ETIME;
>         }
>
> --
> 2.34.1
>


More information about the amd-gfx mailing list