[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Daniel Vetter daniel at ffwll.ch
Mon Feb 8 10:11:47 UTC 2021


On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
> Am 08.02.21 um 10:48 schrieb Daniel Vetter:
> > On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
> > > Am 07.02.21 um 22:50 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > Clarification - as far as I know there are no page fault handlers for kernel
> > > > > mappings. And we are talking about kernel mappings here, right ?  If there were
> > > > > I could solve all those issues the same as I do for user mappings, by
> > > > > invalidating all existing mappings in the kernel (both kmaps and ioreamps)and
> > > > > insert dummy zero or ~0 filled page instead.
> > > > > Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve
> > > > > ioremap API and it's not something that I think can be easily done according to
> > > > > am answer i got to a related topic a few weeks ago
> > > > > https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply
> > > > > i got)
> > > > mmiotrace can, but only for debug, and only on x86 platforms:
> > > > 
> > > > https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
> > > > 
> > > > Should be feasible (but maybe not worth the effort) to extend this to
> > > > support fake unplug.
> > > Mhm, interesting idea you guys brought up here.
> > > 
> > > We don't need a page fault for this to work, all we need to do is to insert
> > > dummy PTEs into the kernels page table at the place where previously the
> > > MMIO mapping has been.
> > Simply pte trick isn't enough, because we need:
> > - drop all writes silently
> > - all reads return 0xff
> > 
> > ptes can't do that themselves, we minimally need write protection and then
> > silently proceed on each write fault without restarting the instruction.
> > Better would be to only catch reads, but x86 doesn't do write-only pte
> > permissions afaik.
> 
> You are not thinking far enough :)
> 
> The dummy PTE is point to a dummy MMIO page which is just never used.
> 
> That hast the exact same properties than our removed MMIO space just doesn't
> goes bananas when a new device is MMIO mapped into that and our driver still
> tries to write there.

Hm, but where do we get such a "guaranteed never used" mmio page from?

It's a nifty idea indeed otherwise ...
-Daniel

> 
> Regards,
> Christian.
> 
> 
> > 
> > > > > > But ugh ...
> > > > > > 
> > > > > > Otoh validating an entire driver like amdgpu without such a trick
> > > > > > against 0xff reads is practically impossible. So maybe you need to add
> > > > > > this as one of the tasks here?
> > > > > Or I could just for validation purposes return ~0 from all reg reads in the code
> > > > > and ignore writes if drm_dev_unplugged, this could already easily validate a big
> > > > > portion of the code flow under such scenario.
> > > > Hm yeah if your really wrap them all, that should work too. Since
> > > > iommappings have __iomem pointer type, as long as amdgpu is sparse
> > > > warning free, should be doable to guarantee this.
> > > Problem is that ~0 is not always a valid register value.
> > > 
> > > You would need to audit every register read that it doesn't use the returned
> > > value blindly as index or similar. That is quite a bit of work.
> > Yeah that's the entire crux here :-/
> > -Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the amd-gfx mailing list