[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Sun Feb 7 21:28:17 UTC 2021

On 2/5/21 5:10 PM, Daniel Vetter wrote:
> On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky
> <Andrey.Grodzovsky at amd.com> wrote:
>>
>> Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev
>> on the same topic I added you to.
> 
> Summarizing my take over there for here plus maybe some more
> clarification. There's two problems:
> 
> - You must guarantee that after the ->remove callback of your driver
> is finished, there's no more mmio or any other hw access. A
> combination of stopping stuff and drm_dev_enter/exit can help with
> that. This prevents the use-after-free issue.
> 
> - For the actual hotunplug time, i.e. anything that can run while your
> driver is used up to the point where ->remove callback has finished
> stopp hw access you must guarantee that code doesn't blow up when it
> gets bogus reads (in the form of 0xff values). drm_dev_enter/exit
> can't help you with that. Plus you should make sure that we're not
> spending forever waiting for a big pile of mmio access all to time out
> because you never bail out - some coarse-grained drm_dev_enter/exit
> might help here.
> 
> Plus finally the userspace access problem: You must guarantee that
> after ->remove has finished that none of the uapi or cross-driver
> access points (driver ioctl, dma-buf, dma-fence, anything else that
> hangs around) can reach the data structures/memory mappings/whatever
> which have been released as part of your ->remove callback.
> drm_dev_enter/exit is again the tool of choice here.
> 
> So you have to use drm_dev_enter/exit for some of the problems we face
> on hotunplug, but it's not the tool that can handle the actual hw
> hotunplug race conditions for you.
> 
> Unfortunately the hw hotunplug race condition is an utter pain to
> test, since essentially you need to validate your driver against
> spurious 0xff reads at any moment. And I don't even have a clever idea
> to simulate this, e.g. by forcefully replacing the iobar mapping: What
> we'd need is a mapping that allows reads (so we can fill a page with
> 0xff and use that everywhere), but instead of rejecting writes, allows
> them, but drops them (so that the 0xff stays intact). Maybe we could
> simulate this with some kernel debug tricks (kinda like mmiotrace)
> with a read-only mapping and dropping every write every time we fault.

Clarification - as far as I know there are no page fault handlers for kernel
mappings. And we are talking about kernel mappings here, right ?  If there were 
I could solve all those issues the same as I do for user mappings, by
invalidating all existing mappings in the kernel (both kmaps and ioreamps)and 
insert dummy zero or ~0 filled page instead.
Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve
ioremap API and it's not something that I think can be easily done according to
am answer i got to a related topic a few weeks ago 
https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply 
i got)

> But ugh ...
> 
> Otoh validating an entire driver like amdgpu without such a trick
> against 0xff reads is practically impossible. So maybe you need to add
> this as one of the tasks here?

Or I could just for validation purposes return ~0 from all reg reads in the code
and ignore writes if drm_dev_unplugged, this could already easily validate a big 
portion of the code flow under such scenario.

Andrey

> -Daniel
> 
>>
>> Andrey
>>
>> On 1/29/21 2:25 PM, Christian König wrote:
>>> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
>>>>
>>>> On 1/29/21 10:16 AM, Christian König wrote:
>>>>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 1/19/21 1:59 PM, Christian König wrote:
>>>>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
>>>>>>>>
>>>>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote:
>>>>>>>>> [SNIP]
>>>>>>>> So say writing in a loop to some harmless scratch register for many times
>>>>>>>> both for plugged
>>>>>>>> and unplugged case and measure total time delta ?
>>>>>>>
>>>>>>> I think we should at least measure the following:
>>>>>>>
>>>>>>> 1. Writing X times to a scratch reg without your patch.
>>>>>>> 2. Writing X times to a scratch reg with your patch.
>>>>>>> 3. Writing X times to a scratch reg with the hardware physically disconnected.
>>>>>>>
>>>>>>> I suggest to repeat that once for Polaris (or older) and once for Vega or
>>>>>>> Navi.
>>>>>>>
>>>>>>> The SRBM on Polaris is meant to introduce some delay in each access, so it
>>>>>>> might react differently then the newer hardware.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> See attached results and the testing code. Ran on Polaris (gfx8) and
>>>>>> Vega10(gfx9)
>>>>>>
>>>>>> In summary, over 1 million WWREG32 in loop with and without this patch you
>>>>>> get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for
>>>>>> each WWREG32) for using drm_dev_enter check when writing registers.
>>>>>>
>>>>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
>>>>>
>>>>> Well if I'm not completely mistaken that are 100ms of accumulated overhead.
>>>>> So around 100ns per write. And even bigger problem is that this is a ~67%
>>>>> increase.
>>>>
>>>>
>>>> My bad, and 67% from what ? How u calculate ?
>>>
>>> My bad, (308501-209689)/209689=47% increase.
>>>
>>>>>
>>>>> I'm not sure how many write we do during normal operation, but that sounds
>>>>> like a bit much. Ideas?
>>>>
>>>> Well, u suggested to move the drm_dev_enter way up but as i see it the problem
>>>> with this is that it increase the chance of race where the
>>>> device is extracted after we check for drm_dev_enter (there is also such
>>>> chance even when it's placed inside WWREG but it's lower).
>>>> Earlier I propsed that instead of doing all those guards scattered all over
>>>> the code simply delay release of system memory pages and unreserve of
>>>> MMIO ranges to until after the device itself is gone after last drm device
>>>> reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after
>>>> PCI remove code because according to him it will upset the PCI subsytem.
>>>
>>> Yeah, that's most likely true as well.
>>>
>>> Maybe Daniel has another idea when he's back from vacation.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>>
>>>>> Christian.
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7810d8d6f03443ce2e0408d8ca22ea99%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481598615581693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zTV6FTpL3titmMTVEPxxVT8e5lTKVsLViwZudEsNn%2Bw%3D&reserved=0
>>>>
>>>
> 
> 
>