[PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Fri Apr 9 18:18:40 UTC 2021
On 2021-04-09 12:39 p.m., Christian König wrote:
> Am 09.04.21 um 17:42 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-09 3:01 a.m., Christian König wrote:
>>> Am 09.04.21 um 08:53 schrieb Christian König:
>>>> Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl
>>>>> is much less effort and less room for error then going through
>>>>> each IOCTL and trying to identify at what point (possibly multiple
>>>>> points) they are about to access HW, some of this is hidden deep
>>>>> in HAL layers such as DC layer in display driver or the multi
>>>>> layers of powerplay/SMU libraries. Also, we can't only limit
>>>>> our-self to back-end if by this you mean ASIC specific functions
>>>>> which access registers. We also need to take care of any MMIO
>>>>> kernel BO (VRAM BOs) where we may access directly MMIO space by
>>>>> pointer from the front end of the driver (HW agnostic) and TTM/DRM
>>>>> layers.
>>>>
>>>> Exactly, yes. The key point is we need to identify such places
>>>> anyway for GPU reset to work properly. So we could just piggy back
>>>> hotplug on top of that work and are done.
>>
>>
>> I see most of this was done By Denis in this patch
>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=df9c8d1aa278c435c30a69b8f2418b4a52fcb929,
>> indeed this doesn't cover the direct by pointer accesses of MMIO and
>> will introduce much more of those and, as people write new code, new
>> places to cover will pop up leading to regressions and extra work to
>> fix. It would be really much better if we could blanket cover it at
>> the very top such as root of all IOCTLs or, for any queued
>> work/timer at the very top function, to handle it once and for all.
>
> And exactly that's what is not possible. At least for the reset case
> you need to look into each hardware access and handle that bit by bit
> and I think that for the hotplug case we should go down that route as
> well.
>
>>>>
>>>>>
>>>>> Our problem here is how to signal all the existing fences on one
>>>>> hand and on the other prevent any new dma_fence waits after we
>>>>> finished signaling existing fences. Once we solved this then there
>>>>> is no problem using drm_dev_unplug in conjunction with
>>>>> drm_dev_enter/exit at the highest level of drm_ioctl to flush any
>>>>> IOCTLs in flight and block any new ones.
>>>>>
>>>>> IMHO when we speak about signalling all fences we don't mean ALL
>>>>> the currently existing dma_fence structs (they are spread all over
>>>>> the place) but rather signal all the HW fences because HW is
>>>>> what's gone and we can't expect for those fences to be ever
>>>>> signaled. All the rest such as: scheduler fences, user fences,
>>>>> drm_gem reservation objects e.t.c. are either dependent on those
>>>>> HW fences and hence signaling the HW fences will in turn signal
>>>>> them or, are not impacted by the HW being gone and hence can still
>>>>> be waited on and will complete. If this assumption is correct then
>>>>> I think that we should use some flag to prevent any new submission
>>>>> to HW which creates HW fences (somewhere around
>>>>> amdgpu_fence_emit), then traverse all existing HW fences
>>>>> (currently they are spread in a few places so maybe we need to
>>>>> track them in a list) and signal them. After that it's safe to cal
>>>>> drm_dev_unplug and be sure synchronize_srcu won't stall because of
>>>>> of dma_fence_wait. After that we can proceed to canceling work
>>>>> items, stopping schedulers e.t.c.
>>>>
>>>> That is problematic as well since you need to make sure that the
>>>> scheduler is not creating a new hardware fence in the moment you
>>>> try to signal all of them. It would require another SRCU or lock
>>>> for this.
>>
>>
>> If we use a list and a flag called 'emit_allowed' under a lock such
>> that in amdgpu_fence_emit we lock the list, check the flag and if
>> true add the new HW fence to list and proceed to HW emition as
>> normal, otherwise return with -ENODEV. In amdgpu_pci_remove we take
>> the lock, set the flag to false, and then iterate the list and force
>> signal it. Will this not prevent any new HW fence creation from now
>> on from any place trying to do so ?
>
> Way to much overhead. The fence processing is intentionally lock free
> to avoid cache line bouncing because the IRQ can move from CPU to CPU.
>
> We need something which at least the processing of fences in the
> interrupt handler doesn't affect at all.
As far as I see in the code, amdgpu_fence_emit is only called from task
context. Also, we can skip this list I proposed and just use
amdgpu_fence_driver_force_completion for each ring to signal all created
HW fences.
>
>>>
>>> Alternatively grabbing the reset write side and stopping and then
>>> restarting the scheduler could work as well.
>>>
>>> Christian.
>>
>>
>> I didn't get the above and I don't see why I need to reuse the GPU
>> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not
>> clear to me why are we focusing on the scheduler threads, any code
>> patch to generate HW fences should be covered, so any code leading to
>> amdgpu_fence_emit needs to be taken into account such as, direct IB
>> submissions, VM flushes e.t.c
>
> You need to work together with the reset lock anyway, cause a hotplug
> could run at the same time as a reset.
For going my way indeed now I see now that I have to take reset write
side lock during HW fences signalling in order to protect against
scheduler/HW fences detachment and reattachment during schedulers
stop/restart. But if we go with your approach then calling
drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit
should be enough to prevent any concurrent GPU resets during unplug. In
fact I already do it anyway -
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=ef0ea4dd29ef44d2649c5eda16c8f4869acc36b1
Andrey
>
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>
More information about the amd-gfx
mailing list