[PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Thu Apr 8 20:39:03 UTC 2021

On 2021-04-08 2:58 p.m., Christian König wrote:
> Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
>> On 2021-04-08 4:32 a.m., Christian König wrote:
>>> Am 08.04.21 um 10:22 schrieb Christian König:
>>>> [SNIP]
>>>>>>
>>>>>>
>>>>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>>>>> need to guarantee no  IOCTL can access MMIO post device unplug 
>>>>>>> OR in flight IOCTLs are done before we finish pci_remove 
>>>>>>> (amdgpu_pci_remove for us).
>>>>>>> For this I suggest we do something like what we worked on with 
>>>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a 
>>>>>>> look at
>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>>>> and
>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440 
>>>>>>>
>>>>>>> We also had same issue there, how to prevent MMIO accesses while 
>>>>>>> the BARs are migrating. What was done there is a refcount was 
>>>>>>> added to count all IOCTLs in flight, for any in flight  IOCTL 
>>>>>>> the BAR migration handler would
>>>>>>> block for the refcount to drop to 0 before it would proceed, for 
>>>>>>> any later IOCTL it stops and wait if device is in migration 
>>>>>>> state. We even don't need the wait part, nothing to wait for, we 
>>>>>>> just return with -ENODEV for this case.
>>>>>>>
>>>>>>
>>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>>
>>>>>> For the hotplug case we could do this in the toplevel since we 
>>>>>> can signal the fence and don't need to block memory management.
>>>>>
>>>>>
>>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
>>>>> every IOCTL ( practically - just drm_ioctl function) with 
>>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>>
>>>
>>> Sorry totally missed this question.
>>>
>>> Yes, exactly that. As discussed for the hotplug case we can do this.
>>
>>
>> Thinking more about it - assuming we are  treating synchronize_srcu 
>> as a 'wait for completion' of any in flight {drm_dev_enter, 
>> drm_dev_exit} scope, some of those scopes might do dma_fence_wait 
>> inside. Since we haven't force signaled the fences yet we will end up 
>> a deadlock. We have to signal all the various fences before doing the 
>> 'wait for'. But we can't signal the fences before setting 
>> 'dev->unplugged = true' to reject further CS and other stuff which 
>> might create more fences we were supposed-to force signal and now 
>> missed them. Effectively setting 'dev->unplugged = true' and doing 
>> synchronize_srcu in one call like drm_dev_unplug does without 
>> signalling all the fences in the device in between these two steps 
>> looks luck a possible deadlock to me - what do you think ?
>>
>
> Indeed, that is a really good argument to handle it the same way as 
> the reset lock.
>
> E.g. not taking it at the high level IOCTL, but rather when the 
> frontend of the driver has acquired all the necessary locks (BO resv, 
> VM lock etc...) before calling into the backend to actually do things 
> with the hardware.
>
> Christian.

 From what you said I understand that you want to solve this problem by 
using drm_dev_enter/exit brackets low enough in the code such that it 
will not include and fence wait.

But inserting dmr_dev_enter/exit on the highest level in drm_ioctl is 
much less effort and less room for error then going through each IOCTL 
and trying to identify at what point (possibly multiple points) they are 
about to access HW, some of this is hidden deep in HAL layers such as DC 
layer in display driver or the multi layers of powerplay/SMU libraries. 
Also, we can't only limit our-self to back-end if by this you mean ASIC 
specific functions which access registers. We also need to take care of 
any MMIO kernel BO (VRAM BOs) where we may access directly MMIO space by 
pointer from the front end of the driver (HW agnostic) and TTM/DRM layers.

Our problem here is how to signal all the existing  fences on one hand 
and on the other prevent any new dma_fence waits after we finished 
signaling existing fences. Once we solved this then there is no problem 
using drm_dev_unplug in conjunction with drm_dev_enter/exit at the 
highest level of drm_ioctl to flush any IOCTLs in flight and block any 
new ones.

IMHO when we speak about signalling all fences we don't mean ALL the 
currently existing dma_fence structs (they are spread all over the 
place) but rather signal all the HW fences because HW is what's gone and 
we can't expect for those fences to be ever signaled. All the rest such 
as: scheduler fences,  user fences, drm_gem reservation objects e.t.c. 
are either dependent on those HW fences and hence signaling the HW 
fences will in turn signal them or, are not impacted by the HW being 
gone and hence can still be waited on and will complete. If this 
assumption is correct then I think that we should use some flag to 
prevent any new submission to HW which creates HW fences (somewhere 
around amdgpu_fence_emit), then traverse all existing HW fences 
(currently they are spread in a few places so maybe we need to track 
them in a list) and signal them. After that it's safe to cal 
drm_dev_unplug and be sure synchronize_srcu won't stall because of of 
dma_fence_wait. After that we can proceed to canceling work items, 
stopping schedulers e.t.c.

Andrey

>
>> Andrey
>>
>>
>