[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Felix Kuehling
felix.kuehling at amd.com
Mon Nov 15 16:22:15 UTC 2021
Am 2021-11-15 um 11:20 a.m. schrieb shaoyunl:
> In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
> already been called, the start_cpsch will not be called since there is no resume in this
> case. When reset been triggered again, driver should avoid to do uninitialization again.
>
> Signed-off-by: shaoyunl <shaoyun.liu at amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 42b2cc999434..62fe28244a80 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
> bool hanging;
>
> dqm_lock(dqm);
> + if (!dqm->sched_running) {
> + dqm_unlock(dqm);
> + return 0;
> + }
> +
> if (!dqm->is_hws_hang)
> unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
> hanging = dqm->is_hws_hang || dqm->is_resetting;
More information about the amd-gfx
mailing list