[PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset

Christian König ckoenig.leichtzumerken at gmail.com
Thu Feb 15 14:43:01 UTC 2024


Am 15.02.24 um 15:36 schrieb Alex Deucher:
> On Thu, Feb 15, 2024 at 2:53 AM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Well using this is in sysfs is a bug to begin with. This would prevent
>> starting new applications and crashing applications which don't expect
>> to get an -EPERM in return here.
>>
>> If we need to make operations mutual exclusive with resets then we need
>> to take the appropriate locks and *not* work around by abusing
>> amdgpu_in_reset().
>>
>> The functionality of amdgpu_in_reset() is just to check in lower level
>> functions if we are inside the higher level reset thread and *not*
>> protect anybody from concurrent access.
>>
>> I think we should probably completely nuke the underlying flag and using
>> the thread owner of the lock to prevent such an abuse.
> Can we land some variant of this for now?

I don't think so, it most likely will break existing use cases.

What we might be able to do is to frame this with 
amdgpu_device_lock_reset_domain() / amdgpu_device_unlock_reset_domain().

> It fixes known issues and it's the same behavior we have in sysfs and debugfs already.

Yeah, as I said that is broken to begin with. It's just that for sysfs 
and debugfs nobody notices.

Regards,
Christian.

>    It's not
> clear to me how this should best be handled.  We basically want to
> block any access to the GPU (registers, firmwares, etc.) while the GPU
> is going through a reset.  Just locking the reset domain doesn't seem
> like the right solution.
>
> Alex
>
>> Regards,
>> Christian.
>>
>> Am 12.02.24 um 21:56 schrieb Deucher, Alexander:
>>> [AMD Official Use Only - General]
>>>
>>> Ping?
>>>
>>>> -----Original Message-----
>>>> From: Deucher, Alexander <Alexander.Deucher at amd.com>
>>>> Sent: Monday, January 29, 2024 10:56 AM
>>>> To: amd-gfx at lists.freedesktop.org
>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>>>> Subject: [PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset
>>>>
>>>> This avoids queries to read registers or query the SMU for telemetry data while
>>>> the GPU is in reset. This mirrors what we already do for sysfs.
>>>>
>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index a2df3025a754..d522e99c6f81 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -607,6 +607,9 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
>>>> *data, struct drm_file *filp)
>>>>         int i, found, ret;
>>>>         int ui32_size = sizeof(ui32);
>>>>
>>>> +     if (amdgpu_in_reset(adev))
>>>> +             return -EPERM;
>>>> +
>>>>         if (!info->return_size || !info->return_pointer)
>>>>                 return -EINVAL;
>>>>
>>>> --
>>>> 2.42.0



More information about the amd-gfx mailing list