[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Felix Kuehling
felix.kuehling at amd.com
Mon Nov 15 16:06:51 UTC 2021
Am 2021-11-14 um 12:55 p.m. schrieb shaoyunl:
> In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
> already been called, the start_cpsch will not be called since there is no resume in this
> case. When reset been triggered again, driver should avoid to do uninitialization again.
>
> Signed-off-by: shaoyunl <shaoyun.liu at amd.com>
If there is a possibility that stop_cpsch is called multiple times, I
think the check for that should be at the start of the function.
Something like:
if (!dqm->sched_running)
return 0;
Regards,
Felix
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 42b2cc999434..bcc8980d77e0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1228,12 +1228,14 @@ static int stop_cpsch(struct device_queue_manager *dqm)
> if (!dqm->is_hws_hang)
> unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
> hanging = dqm->is_hws_hang || dqm->is_resetting;
> - dqm->sched_running = false;
>
> - pm_release_ib(&dqm->packet_mgr);
> + if (dqm->sched_running) {
> + dqm->sched_running = false;
> + pm_release_ib(&dqm->packet_mgr);
> + kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
> + pm_uninit(&dqm->packet_mgr, hanging);
> + }
>
> - kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
> - pm_uninit(&dqm->packet_mgr, hanging);
> dqm_unlock(dqm);
>
> return 0;
More information about the amd-gfx
mailing list