[PATCH] drm/amdgpu fix incorrect sysfs remove behavior for xgmi

Christian König ckoenig.leichtzumerken at gmail.com
Mon May 18 07:12:33 UTC 2020


Am 18.05.20 um 06:44 schrieb Jack Zhang:
> Under xgmi setup,some sysfs fail to create for the second time of kmd
> driver loading. It's due to sysfs nodes are not removed appropriately
> in the last unlod time.
>
> Changes of this patch:
> 1. remove sysfs for dev_attr_xgmi_error
> 2. remove sysfs_link adev->dev->kobj with target name.
>     And it only needs to be removed once for a xgmi setup
> 3. remove sysfs_link hive->kobj with target name
>
> In amdgpu_xgmi_remove_device:
> 1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device
> 2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of
> device.
>
> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 22 +++++++++++++++-------
>   1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index e9e59bc..bfe2468 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -325,9 +325,17 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>   static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev,
>   					  struct amdgpu_hive_info *hive)
>   {
> +	char node[10] = { 0 };

Please don't initialize things like this, use memset() instead.

Regards,
Christian.

>   	device_remove_file(adev->dev, &dev_attr_xgmi_device_id);
> -	sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique);
> -	sysfs_remove_link(hive->kobj, adev->ddev->unique);
> +	device_remove_file(adev->dev, &dev_attr_xgmi_error);
> +
> +	if (adev != hive->adev) {
> +		sysfs_remove_link(&adev->dev->kobj,"xgmi_hive_info");
> +	}
> +
> +	sprintf(node, "node%d", hive->number_devices);
> +	sysfs_remove_link(hive->kobj, node);
> +
>   }
>   
>   
> @@ -583,14 +591,14 @@ int amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>   	if (!hive)
>   		return -EINVAL;
>   
> -	if (!(hive->number_devices--)) {
> +	task_barrier_rem_task(&hive->tb);
> +	amdgpu_xgmi_sysfs_rem_dev_info(adev, hive);
> +	mutex_unlock(&hive->hive_lock);
> +
> +	if(!(--hive->number_devices)){
>   		amdgpu_xgmi_sysfs_destroy(adev, hive);
>   		mutex_destroy(&hive->hive_lock);
>   		mutex_destroy(&hive->reset_lock);
> -	} else {
> -		task_barrier_rem_task(&hive->tb);
> -		amdgpu_xgmi_sysfs_rem_dev_info(adev, hive);
> -		mutex_unlock(&hive->hive_lock);
>   	}
>   
>   	return psp_xgmi_terminate(&adev->psp);



More information about the amd-gfx mailing list