[PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset

Christian König ckoenig.leichtzumerken at gmail.com
Thu Feb 15 07:27:57 UTC 2024


Well using this is in sysfs is a bug to begin with. This would prevent 
starting new applications and crashing applications which don't expect 
to get an -EPERM in return here.

If we need to make operations mutual exclusive with resets then we need 
to take the appropriate locks and *not* work around by abusing 
amdgpu_in_reset().

The functionality of amdgpu_in_reset() is just to check in lower level 
functions if we are inside the higher level reset thread and *not* 
protect anybody from concurrent access.

I think we should probably completely nuke the underlying flag and using 
the thread owner of the lock to prevent such an abuse.

Regards,
Christian.

Am 12.02.24 um 21:56 schrieb Deucher, Alexander:
> [AMD Official Use Only - General]
>
> Ping?
>
>> -----Original Message-----
>> From: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Sent: Monday, January 29, 2024 10:56 AM
>> To: amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: [PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset
>>
>> This avoids queries to read registers or query the SMU for telemetry data while
>> the GPU is in reset. This mirrors what we already do for sysfs.
>>
>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index a2df3025a754..d522e99c6f81 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -607,6 +607,9 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
>> *data, struct drm_file *filp)
>>        int i, found, ret;
>>        int ui32_size = sizeof(ui32);
>>
>> +     if (amdgpu_in_reset(adev))
>> +             return -EPERM;
>> +
>>        if (!info->return_size || !info->return_pointer)
>>                return -EINVAL;
>>
>> --
>> 2.42.0



More information about the amd-gfx mailing list