[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Sat Feb 6 14:18:23 UTC 2021

On Sat, Feb 6, 2021 at 12:09 AM Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
>
>
>
> On 2/5/21 5:10 PM, Daniel Vetter wrote:
> > On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky at amd.com> wrote:
> >>
> >> Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev
> >> on the same topic I added you to.
> >
> > Summarizing my take over there for here plus maybe some more
> > clarification. There's two problems:
> >
> > - You must guarantee that after the ->remove callback of your driver
> > is finished, there's no more mmio or any other hw access. A
> > combination of stopping stuff and drm_dev_enter/exit can help with
> > that. This prevents the use-after-free issue.
> >
> > - For the actual hotunplug time, i.e. anything that can run while your
> > driver is used up to the point where ->remove callback has finished
> > stopp hw access you must guarantee that code doesn't blow up when it
> > gets bogus reads (in the form of 0xff values). drm_dev_enter/exit
> > can't help you with that. Plus you should make sure that we're not
> > spending forever waiting for a big pile of mmio access all to time out
> > because you never bail out - some coarse-grained drm_dev_enter/exit
> > might help here.
> >
> > Plus finally the userspace access problem: You must guarantee that
> > after ->remove has finished that none of the uapi or cross-driver
> > access points (driver ioctl, dma-buf, dma-fence, anything else that
> > hangs around) can reach the data structures/memory mappings/whatever
> > which have been released as part of your ->remove callback.
> > drm_dev_enter/exit is again the tool of choice here.
> >
> > So you have to use drm_dev_enter/exit for some of the problems we face
> > on hotunplug, but it's not the tool that can handle the actual hw
> > hotunplug race conditions for you.
> >
> > Unfortunately the hw hotunplug race condition is an utter pain to
> > test, since essentially you need to validate your driver against
> > spurious 0xff reads at any moment. And I don't even have a clever idea
> > to simulate this, e.g. by forcefully replacing the iobar mapping: What
> > we'd need is a mapping that allows reads (so we can fill a page with
> > 0xff and use that everywhere), but instead of rejecting writes, allows
> > them, but drops them (so that the 0xff stays intact). Maybe we could
> > simulate this with some kernel debug tricks (kinda like mmiotrace)
> > with a read-only mapping and dropping every write every time we fault.
> > But ugh ...
> >
> > Otoh validating an entire driver like amdgpu without such a trick
> > against 0xff reads is practically impossible. So maybe you need to add
> > this as one of the tasks here?
> > -Daniel
>
> Not sure it's not a dump idea but still, worth asking -  what if I
> just simply quietly return early from the .remove  callback  without
> doing anything there, the driver will not be aware that the device
> is removed and will at least try to continue working as usual including
> IOCTLs, job scheduling e.t.c. On the other hand all MMIO read accesses will
> start returning ~0, regarding rejecting writes - I don't see anywhere
> we test for result of writing (e.g. amdgpu_mm_wreg8) so seems they will
> just seamlessly  go through... Or is it the pci_dev that will be freed
> by PCI core itself and so I will immediately crash ?

This still requires that you physically unplug the device, so not
something you can do in CI. Plus it doesn't allow you to easily fake a
hotunplug in the middle of something interesting like an atomic
modeset commit. If you instead punch out the mmio mapping with some
pte trick, you can intercept the faults and count down until you
actually switch over to only returning 0xff. This allows you to sweep
through entire complex execution flows so that you have a guarantee
you've actually caught everything.

If otoh you just hotunplug and don't clean up (or equivalent, insert a
long sleep at the beginning of your ->remove hook) then you just check
that at the beginning of each operation there's a check that bails
out.

It's better than nothing for prototyping, but I don't think it's
useful in a CI setting to assure stuff stays fixed.
-Daniel

> Andrey
>
> >
> >>
> >> Andrey
> >>
> >> On 1/29/21 2:25 PM, Christian König wrote:
> >>> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
> >>>>
> >>>> On 1/29/21 10:16 AM, Christian König wrote:
> >>>>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
> >>>>>>
> >>>>>> On 1/19/21 1:59 PM, Christian König wrote:
> >>>>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
> >>>>>>>>
> >>>>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote:
> >>>>>>>>> [SNIP]
> >>>>>>>> So say writing in a loop to some harmless scratch register for many times
> >>>>>>>> both for plugged
> >>>>>>>> and unplugged case and measure total time delta ?
> >>>>>>>
> >>>>>>> I think we should at least measure the following:
> >>>>>>>
> >>>>>>> 1. Writing X times to a scratch reg without your patch.
> >>>>>>> 2. Writing X times to a scratch reg with your patch.
> >>>>>>> 3. Writing X times to a scratch reg with the hardware physically disconnected.
> >>>>>>>
> >>>>>>> I suggest to repeat that once for Polaris (or older) and once for Vega or
> >>>>>>> Navi.
> >>>>>>>
> >>>>>>> The SRBM on Polaris is meant to introduce some delay in each access, so it
> >>>>>>> might react differently then the newer hardware.
> >>>>>>>
> >>>>>>> Christian.
> >>>>>>
> >>>>>>
> >>>>>> See attached results and the testing code. Ran on Polaris (gfx8) and
> >>>>>> Vega10(gfx9)
> >>>>>>
> >>>>>> In summary, over 1 million WWREG32 in loop with and without this patch you
> >>>>>> get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for
> >>>>>> each WWREG32) for using drm_dev_enter check when writing registers.
> >>>>>>
> >>>>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
> >>>>>
> >>>>> Well if I'm not completely mistaken that are 100ms of accumulated overhead.
> >>>>> So around 100ns per write. And even bigger problem is that this is a ~67%
> >>>>> increase.
> >>>>
> >>>>
> >>>> My bad, and 67% from what ? How u calculate ?
> >>>
> >>> My bad, (308501-209689)/209689=47% increase.
> >>>
> >>>>>
> >>>>> I'm not sure how many write we do during normal operation, but that sounds
> >>>>> like a bit much. Ideas?
> >>>>
> >>>> Well, u suggested to move the drm_dev_enter way up but as i see it the problem
> >>>> with this is that it increase the chance of race where the
> >>>> device is extracted after we check for drm_dev_enter (there is also such
> >>>> chance even when it's placed inside WWREG but it's lower).
> >>>> Earlier I propsed that instead of doing all those guards scattered all over
> >>>> the code simply delay release of system memory pages and unreserve of
> >>>> MMIO ranges to until after the device itself is gone after last drm device
> >>>> reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after
> >>>> PCI remove code because according to him it will upset the PCI subsytem.
> >>>
> >>> Yeah, that's most likely true as well.
> >>>
> >>> Maybe Daniel has another idea when he's back from vacation.
> >>>
> >>> Christian.
> >>>
> >>>>
> >>>> Andrey
> >>>>
> >>>>>
> >>>>> Christian.
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx at lists.freedesktop.org
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7810d8d6f03443ce2e0408d8ca22ea99%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481598615581693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zTV6FTpL3titmMTVEPxxVT8e5lTKVsLViwZudEsNn%2Bw%3D&reserved=0
> >>>>
> >>>
> >
> >
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch