[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Sun Feb 7 21:50:56 UTC 2021

On Sun, Feb 7, 2021 at 10:28 PM Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
>
>
>
> On 2/5/21 5:10 PM, Daniel Vetter wrote:
> > On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky at amd.com> wrote:
> >>
> >> Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev
> >> on the same topic I added you to.
> >
> > Summarizing my take over there for here plus maybe some more
> > clarification. There's two problems:
> >
> > - You must guarantee that after the ->remove callback of your driver
> > is finished, there's no more mmio or any other hw access. A
> > combination of stopping stuff and drm_dev_enter/exit can help with
> > that. This prevents the use-after-free issue.
> >
> > - For the actual hotunplug time, i.e. anything that can run while your
> > driver is used up to the point where ->remove callback has finished
> > stopp hw access you must guarantee that code doesn't blow up when it
> > gets bogus reads (in the form of 0xff values). drm_dev_enter/exit
> > can't help you with that. Plus you should make sure that we're not
> > spending forever waiting for a big pile of mmio access all to time out
> > because you never bail out - some coarse-grained drm_dev_enter/exit
> > might help here.
> >
> > Plus finally the userspace access problem: You must guarantee that
> > after ->remove has finished that none of the uapi or cross-driver
> > access points (driver ioctl, dma-buf, dma-fence, anything else that
> > hangs around) can reach the data structures/memory mappings/whatever
> > which have been released as part of your ->remove callback.
> > drm_dev_enter/exit is again the tool of choice here.
> >
> > So you have to use drm_dev_enter/exit for some of the problems we face
> > on hotunplug, but it's not the tool that can handle the actual hw
> > hotunplug race conditions for you.
> >
> > Unfortunately the hw hotunplug race condition is an utter pain to
> > test, since essentially you need to validate your driver against
> > spurious 0xff reads at any moment. And I don't even have a clever idea
> > to simulate this, e.g. by forcefully replacing the iobar mapping: What
> > we'd need is a mapping that allows reads (so we can fill a page with
> > 0xff and use that everywhere), but instead of rejecting writes, allows
> > them, but drops them (so that the 0xff stays intact). Maybe we could
> > simulate this with some kernel debug tricks (kinda like mmiotrace)
> > with a read-only mapping and dropping every write every time we fault.
>
> Clarification - as far as I know there are no page fault handlers for kernel
> mappings. And we are talking about kernel mappings here, right ?  If there were
> I could solve all those issues the same as I do for user mappings, by
> invalidating all existing mappings in the kernel (both kmaps and ioreamps)and
> insert dummy zero or ~0 filled page instead.
> Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve
> ioremap API and it's not something that I think can be easily done according to
> am answer i got to a related topic a few weeks ago
> https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply
> i got)

mmiotrace can, but only for debug, and only on x86 platforms:

https://www.kernel.org/doc/html/latest/trace/mmiotrace.html

Should be feasible (but maybe not worth the effort) to extend this to
support fake unplug.

>
> > But ugh ...
> >
> > Otoh validating an entire driver like amdgpu without such a trick
> > against 0xff reads is practically impossible. So maybe you need to add
> > this as one of the tasks here?
>
> Or I could just for validation purposes return ~0 from all reg reads in the code
> and ignore writes if drm_dev_unplugged, this could already easily validate a big
> portion of the code flow under such scenario.

Hm yeah if your really wrap them all, that should work too. Since
iommappings have __iomem pointer type, as long as amdgpu is sparse
warning free, should be doable to guarantee this.
-Daniel

> Andrey
>
> > -Daniel
> >
> >>
> >> Andrey
> >>
> >> On 1/29/21 2:25 PM, Christian König wrote:
> >>> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
> >>>>
> >>>> On 1/29/21 10:16 AM, Christian König wrote:
> >>>>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
> >>>>>>
> >>>>>> On 1/19/21 1:59 PM, Christian König wrote:
> >>>>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
> >>>>>>>>
> >>>>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote:
> >>>>>>>>> [SNIP]
> >>>>>>>> So say writing in a loop to some harmless scratch register for many times
> >>>>>>>> both for plugged
> >>>>>>>> and unplugged case and measure total time delta ?
> >>>>>>>
> >>>>>>> I think we should at least measure the following:
> >>>>>>>
> >>>>>>> 1. Writing X times to a scratch reg without your patch.
> >>>>>>> 2. Writing X times to a scratch reg with your patch.
> >>>>>>> 3. Writing X times to a scratch reg with the hardware physically disconnected.
> >>>>>>>
> >>>>>>> I suggest to repeat that once for Polaris (or older) and once for Vega or
> >>>>>>> Navi.
> >>>>>>>
> >>>>>>> The SRBM on Polaris is meant to introduce some delay in each access, so it
> >>>>>>> might react differently then the newer hardware.
> >>>>>>>
> >>>>>>> Christian.
> >>>>>>
> >>>>>>
> >>>>>> See attached results and the testing code. Ran on Polaris (gfx8) and
> >>>>>> Vega10(gfx9)
> >>>>>>
> >>>>>> In summary, over 1 million WWREG32 in loop with and without this patch you
> >>>>>> get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for
> >>>>>> each WWREG32) for using drm_dev_enter check when writing registers.
> >>>>>>
> >>>>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
> >>>>>
> >>>>> Well if I'm not completely mistaken that are 100ms of accumulated overhead.
> >>>>> So around 100ns per write. And even bigger problem is that this is a ~67%
> >>>>> increase.
> >>>>
> >>>>
> >>>> My bad, and 67% from what ? How u calculate ?
> >>>
> >>> My bad, (308501-209689)/209689=47% increase.
> >>>
> >>>>>
> >>>>> I'm not sure how many write we do during normal operation, but that sounds
> >>>>> like a bit much. Ideas?
> >>>>
> >>>> Well, u suggested to move the drm_dev_enter way up but as i see it the problem
> >>>> with this is that it increase the chance of race where the
> >>>> device is extracted after we check for drm_dev_enter (there is also such
> >>>> chance even when it's placed inside WWREG but it's lower).
> >>>> Earlier I propsed that instead of doing all those guards scattered all over
> >>>> the code simply delay release of system memory pages and unreserve of
> >>>> MMIO ranges to until after the device itself is gone after last drm device
> >>>> reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after
> >>>> PCI remove code because according to him it will upset the PCI subsytem.
> >>>
> >>> Yeah, that's most likely true as well.
> >>>
> >>> Maybe Daniel has another idea when he's back from vacation.
> >>>
> >>> Christian.
> >>>
> >>>>
> >>>> Andrey
> >>>>
> >>>>>
> >>>>> Christian.
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx at lists.freedesktop.org
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7810d8d6f03443ce2e0408d8ca22ea99%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481598615581693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zTV6FTpL3titmMTVEPxxVT8e5lTKVsLViwZudEsNn%2Bw%3D&reserved=0
> >>>>
> >>>
> >
> >
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch