[PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Tue Nov 24 22:27:42 UTC 2020


On 11/24/20 9:49 AM, Daniel Vetter wrote:
> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index caf828a..812e592 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -27,6 +27,7 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/reboot.h>
>>   #include <linux/syscalls.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ras.h"
>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>   		.attrs = attrs,
>>   	};
>>   
>> -	sysfs_remove_group(&adev->dev->kobj, &group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &group);
> This looks wrong. sysfs, like any other interface, should be
> unconditionally thrown out when we do the drm_dev_unregister. Whether
> hotunplugged or not should matter at all. Either this isn't needed at all,
> or something is wrong with the ordering here. But definitely fishy.
> -Daniel


So technically this is needed because kobejct's sysfs directory entry kobj->sd 
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the device
until last reference to drm file is dropped (which can happen later) we end up 
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while 
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out - 
we should use
default groups attributes instead of explicit calls to sysfs interface and this 
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default 
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs 
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems 
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can 
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as 
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.

Andrey


>
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index 2b7c90b..54331fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -24,6 +24,7 @@
>>   #include <linux/firmware.h>
>>   #include <linux/slab.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ucode.h"
>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>   
>>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>   {
>> -	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>   }
>>   
>>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>> -- 
>> 2.7.4
>>


More information about the dri-devel mailing list