[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Christian König christian.koenig at amd.com
Mon Feb 8 13:59:50 UTC 2021


Am 08.02.21 um 11:11 schrieb Daniel Vetter:
> On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
>> Am 08.02.21 um 10:48 schrieb Daniel Vetter:
>>> On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
>>>> Am 07.02.21 um 22:50 schrieb Daniel Vetter:
>>>>> [SNIP]
>>>>>> Clarification - as far as I know there are no page fault handlers for kernel
>>>>>> mappings. And we are talking about kernel mappings here, right ?  If there were
>>>>>> I could solve all those issues the same as I do for user mappings, by
>>>>>> invalidating all existing mappings in the kernel (both kmaps and ioreamps)and
>>>>>> insert dummy zero or ~0 filled page instead.
>>>>>> Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve
>>>>>> ioremap API and it's not something that I think can be easily done according to
>>>>>> am answer i got to a related topic a few weeks ago
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Flinux-pci%2Fmsg103396.html&data=04%7C01%7Cchristian.koenig%40amd.com%7C0ab6d16bc49443d7dd2708d8cc19f3aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483759137213247%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mLqR3PoMBvOodcNJA6K6XP1AJ7hiz847y%2Bw%2BcGegSZE%3D&reserved=0 (that was the only reply
>>>>>> i got)
>>>>> mmiotrace can, but only for debug, and only on x86 platforms:
>>>>>
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Ftrace%2Fmmiotrace.html&data=04%7C01%7Cchristian.koenig%40amd.com%7C0ab6d16bc49443d7dd2708d8cc19f3aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483759137213247%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yjEaR73m8rjL4ARo0upHnjSAtE4yw%2BHAISWCSgmjOoY%3D&reserved=0
>>>>>
>>>>> Should be feasible (but maybe not worth the effort) to extend this to
>>>>> support fake unplug.
>>>> Mhm, interesting idea you guys brought up here.
>>>>
>>>> We don't need a page fault for this to work, all we need to do is to insert
>>>> dummy PTEs into the kernels page table at the place where previously the
>>>> MMIO mapping has been.
>>> Simply pte trick isn't enough, because we need:
>>> - drop all writes silently
>>> - all reads return 0xff
>>>
>>> ptes can't do that themselves, we minimally need write protection and then
>>> silently proceed on each write fault without restarting the instruction.
>>> Better would be to only catch reads, but x86 doesn't do write-only pte
>>> permissions afaik.
>> You are not thinking far enough :)
>>
>> The dummy PTE is point to a dummy MMIO page which is just never used.
>>
>> That hast the exact same properties than our removed MMIO space just doesn't
>> goes bananas when a new device is MMIO mapped into that and our driver still
>> tries to write there.
> Hm, but where do we get such a "guaranteed never used" mmio page from?

Well we have tons of unused IO space on 64bit systems these days.

Doesn't really needs to be PCIe address space, doesn't it?

Christian.

>
> It's a nifty idea indeed otherwise ...
> -Daniel
>
>> Regards,
>> Christian.
>>
>>
>>>>>>> But ugh ...
>>>>>>>
>>>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>>>> against 0xff reads is practically impossible. So maybe you need to add
>>>>>>> this as one of the tasks here?
>>>>>> Or I could just for validation purposes return ~0 from all reg reads in the code
>>>>>> and ignore writes if drm_dev_unplugged, this could already easily validate a big
>>>>>> portion of the code flow under such scenario.
>>>>> Hm yeah if your really wrap them all, that should work too. Since
>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>>>> warning free, should be doable to guarantee this.
>>>> Problem is that ~0 is not always a valid register value.
>>>>
>>>> You would need to audit every register read that it doesn't use the returned
>>>> value blindly as index or similar. That is quite a bit of work.
>>> Yeah that's the entire crux here :-/
>>> -Daniel



More information about the amd-gfx mailing list