[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again

Felix Kuehling felix.kuehling at amd.com
Mon Nov 15 16:22:15 UTC 2021


Am 2021-11-15 um 11:20 a.m. schrieb shaoyunl:
> In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
> already been called, the start_cpsch will not be called since there is no resume in this
> case.  When reset been triggered again, driver should avoid to do uninitialization again.
>
> Signed-off-by: shaoyunl <shaoyun.liu at amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 42b2cc999434..62fe28244a80 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
>  	bool hanging;
>  
>  	dqm_lock(dqm);
> +	if (!dqm->sched_running) {
> +		dqm_unlock(dqm);
> +		return 0;
> +	}
> +
>  	if (!dqm->is_hws_hang)
>  		unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
>  	hanging = dqm->is_hws_hang || dqm->is_resetting;


More information about the amd-gfx mailing list