[PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

Fri Nov 30 09:03:21 UTC 2018

Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
> XGMI hive has some resources allocted on device init which
> needs to be deallocated when the device is unregistered.
>
> Add per hive wq to allow all the nodes in hive to run resets
> concurently - this should speed up the total reset time to avoid
> breaching the PSP FW timeout.

Do you really want a workqueue? That sounds like the exact opposite of 
what you describe here.

E.g. a workqueue is there to serialize all work items scheduled to it.

What you most likely rather want is a work item per device which are 
scheduled, run in parallel and are joined back together before dropping 
the per hive lock.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 ++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
>   2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index fb37e69..9ac2dc5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -61,6 +61,8 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
>   	INIT_LIST_HEAD(&tmp->device_list);
>   	mutex_init(&tmp->hive_lock);
>   
> +	tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | WQ_HIGHPRI, 0);
> +
>   	return tmp;
>   }
>   
> @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>   	mutex_unlock(&xgmi_mutex);
>   	return ret;
>   }
> +
> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_hive_info *hive;
> +
> +	if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
> +		return;
> +
> +	mutex_lock(&xgmi_mutex);
> +
> +	hive = amdgpu_get_xgmi_hive(adev);
> +	if (!hive)
> +		goto exit;
> +
> +	if (!(hive->number_devices--)) {
> +		mutex_destroy(&hive->hive_lock);
> +		destroy_workqueue(hive->reset_queue);
> +	}
> +
> +exit:
> +	mutex_unlock(&xgmi_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> index 6335bfd..285ab93 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> @@ -30,10 +30,13 @@ struct amdgpu_hive_info {
>   	struct psp_xgmi_topology_info	topology_info;
>   	int number_devices;
>   	struct mutex hive_lock;
> +	/* hive members reset wq */
> +	struct workqueue_struct *reset_queue;
>   };
>   
>   struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>   int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev);
>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>   
>   #endif