[PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset

Christian König christian.koenig at amd.com
Fri Jul 3 07:50:31 UTC 2020


Am 03.07.20 um 08:05 schrieb Felix Kuehling:
> Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi, Christian and Alex
>>        Not only amdgpu ioctls, but amdkfd ioctls also have the same issue.
> Most KFD ioctls don't access HW directly. The only place that interacts
> with HW in KFD is the device queues manager (DQM) and beneath it the
> packet manager. In DQM we already have protections to avoid HW access
> while a reset is in progress.
>
> For other HW access, KFD goes through helper functions in amdgpu.
>
> Memory management ioctls indirectly access HW for page table updates.
> However, that requires validating the page table BOs first. Are VRAM BOs
> considered "valid" during a GPU reset? When using SDMA for page table
> updates, the DRM GPU scheduler is also involved. Is that suspended
> during a GPU reset?

That stuff should work concurrently. The scheduler is stopped during a 
reset, but we can still push new jobs to the queues.

Stuff like TLB flushes are also harmless since after a reset we can 
safely assume that the TLB is completely empty.

> The only other KFD ioctl that looks like it might access HW during a GPU
> reset is kfd_ioctl_get_clock_counters by calling
> amdgpu_amdkfd_get_gpu_clock_counter.

Yeah, that is indeed a problem which needs handling.

Christian.

>
> Regards,
>    Felix
>
>
>
>> Best Regards
>> Dennis Li
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian König
>> Sent: Wednesday, July 1, 2020 4:20 PM
>> To: Alex Deucher <alexdeucher at gmail.com>; amd-gfx list <amd-gfx at lists.freedesktop.org>
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset
>>
>> I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock.
>>
>> Christian.
>>
>> Am 01.07.20 um 07:33 schrieb Alex Deucher:
>>> ping?
>>>
>>> On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher at gmail.com> wrote:
>>>> When the GPU is in reset, accessing the hw is unreliable and could
>>>> interfere with the reset.  Return an error in those cases.
>>>>
>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index 341d072edd95..fd51d6554ee2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>>                   if (info->read_mmr_reg.count > 128)
>>>>                           return -EINVAL;
>>>>
>>>> +               if (adev->in_gpu_reset)
>>>> +                       return -EPERM;
>>>> +
>>>>                   regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL);
>>>>                   if (!regs)
>>>>                           return -ENOMEM; @@ -854,6 +857,9 @@ static
>>>> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>>                   if (!adev->pm.dpm_enabled)
>>>>                           return -ENOENT;
>>>>
>>>> +               if (adev->in_gpu_reset)
>>>> +                       return -EPERM;
>>>> +
>>>>                   switch (info->sensor_info.type) {
>>>>                   case AMDGPU_INFO_SENSOR_GFX_SCLK:
>>>>                           /* get sclk in Mhz */
>>>> --
>>>> 2.25.4
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://list
>>> s.freedesktop.org/mailman/listinfo/amd-gfx
>>> nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6
>>> 08e11a82d994e183d%7C0%7C0%7C637291884123360340&sdata=GNPWQNndUJKx7
>>> 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&reserved=0
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list