[PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset

Alex Deucher alexdeucher at gmail.com
Thu Feb 15 14:36:20 UTC 2024


On Thu, Feb 15, 2024 at 2:53 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Well using this is in sysfs is a bug to begin with. This would prevent
> starting new applications and crashing applications which don't expect
> to get an -EPERM in return here.
>
> If we need to make operations mutual exclusive with resets then we need
> to take the appropriate locks and *not* work around by abusing
> amdgpu_in_reset().
>
> The functionality of amdgpu_in_reset() is just to check in lower level
> functions if we are inside the higher level reset thread and *not*
> protect anybody from concurrent access.
>
> I think we should probably completely nuke the underlying flag and using
> the thread owner of the lock to prevent such an abuse.

Can we land some variant of this for now?  It fixes known issues and
it's the same behavior we have in sysfs and debugfs already.  It's not
clear to me how this should best be handled.  We basically want to
block any access to the GPU (registers, firmwares, etc.) while the GPU
is going through a reset.  Just locking the reset domain doesn't seem
like the right solution.

Alex

>
> Regards,
> Christian.
>
> Am 12.02.24 um 21:56 schrieb Deucher, Alexander:
> > [AMD Official Use Only - General]
> >
> > Ping?
> >
> >> -----Original Message-----
> >> From: Deucher, Alexander <Alexander.Deucher at amd.com>
> >> Sent: Monday, January 29, 2024 10:56 AM
> >> To: amd-gfx at lists.freedesktop.org
> >> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> >> Subject: [PATCH] drm/amdgpu: bail on INFO IOCTL if the GPU is in reset
> >>
> >> This avoids queries to read registers or query the SMU for telemetry data while
> >> the GPU is in reset. This mirrors what we already do for sysfs.
> >>
> >> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> index a2df3025a754..d522e99c6f81 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> @@ -607,6 +607,9 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
> >> *data, struct drm_file *filp)
> >>        int i, found, ret;
> >>        int ui32_size = sizeof(ui32);
> >>
> >> +     if (amdgpu_in_reset(adev))
> >> +             return -EPERM;
> >> +
> >>        if (!info->return_size || !info->return_pointer)
> >>                return -EINVAL;
> >>
> >> --
> >> 2.42.0
>


More information about the amd-gfx mailing list