[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Tue Feb 9 14:30:52 UTC 2021

On 2/9/21 2:58 AM, Christian König wrote:
>
> Am 08.02.21 um 23:15 schrieb Andrey Grodzovsky:
>>
>> On 2/8/21 11:23 AM, Daniel Vetter wrote:
>>> On Mon, Feb 8, 2021 at 3:00 PM Christian König <christian.koenig at amd.com> 
>>> wrote:
>>>> Am 08.02.21 um 11:11 schrieb Daniel Vetter:
>>>>> On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
>>>>>> Am 08.02.21 um 10:48 schrieb Daniel Vetter:
>>>>>>> On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
>>>>>>>> Am 07.02.21 um 22:50 schrieb Daniel Vetter:
>>>>>>>>> [SNIP]
>>>>>>>>>> Clarification - as far as I know there are no page fault handlers for 
>>>>>>>>>> kernel
>>>>>>>>>> mappings. And we are talking about kernel mappings here, right ?  If 
>>>>>>>>>> there were
>>>>>>>>>> I could solve all those issues the same as I do for user mappings, by
>>>>>>>>>> invalidating all existing mappings in the kernel (both kmaps and 
>>>>>>>>>> ioreamps)and
>>>>>>>>>> insert dummy zero or ~0 filled page instead.
>>>>>>>>>> Also, I assume forcefully remapping the IO BAR to ~0 filled page 
>>>>>>>>>> would involve
>>>>>>>>>> ioremap API and it's not something that I think can be easily done 
>>>>>>>>>> according to
>>>>>>>>>> am answer i got to a related topic a few weeks ago
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Flinux-pci%2Fmsg103396.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9d1bdf4cee504cd71b4908d8cc4df310%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483982454608249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Anw%2BOwJ%2B5tvjW3tmkVNdz13%2BZ18vdpfOLWqsUZL7D2I%3D&reserved=0 
>>>>>>>>>> (that was the only reply
>>>>>>>>>> i got)
>>>>>>>>> mmiotrace can, but only for debug, and only on x86 platforms:
>>>>>>>>>
>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Ftrace%2Fmmiotrace.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9d1bdf4cee504cd71b4908d8cc4df310%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483982454608249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Wa7BFNySVQJLyD6WY4pZHTP1QfeZwD7F5ydBrXuxppQ%3D&reserved=0 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Should be feasible (but maybe not worth the effort) to extend this to
>>>>>>>>> support fake unplug.
>>>>>>>> Mhm, interesting idea you guys brought up here.
>>>>>>>>
>>>>>>>> We don't need a page fault for this to work, all we need to do is to 
>>>>>>>> insert
>>>>>>>> dummy PTEs into the kernels page table at the place where previously the
>>>>>>>> MMIO mapping has been.
>>>>>>> Simply pte trick isn't enough, because we need:
>>>>>>> - drop all writes silently
>>>>>>> - all reads return 0xff
>>>>>>>
>>>>>>> ptes can't do that themselves, we minimally need write protection and then
>>>>>>> silently proceed on each write fault without restarting the instruction.
>>>>>>> Better would be to only catch reads, but x86 doesn't do write-only pte
>>>>>>> permissions afaik.
>>>>>> You are not thinking far enough :)
>>>>>>
>>>>>> The dummy PTE is point to a dummy MMIO page which is just never used.
>>>>>>
>>>>>> That hast the exact same properties than our removed MMIO space just doesn't
>>>>>> goes bananas when a new device is MMIO mapped into that and our driver still
>>>>>> tries to write there.
>>>>> Hm, but where do we get such a "guaranteed never used" mmio page from?
>>>> Well we have tons of unused IO space on 64bit systems these days.
>>>>
>>>> Doesn't really needs to be PCIe address space, doesn't it?
>>> That sounds very trusting to modern systems not decoding random
>>> ranges. E.g. the pci code stopped extending the host bridge windows on
>>> its own, entirely relying on the acpi provided ranges, to avoid
>>> stomping on stuff that's the but not listed anywhere.
>>>
>>> I guess if we have a range behind a pci bridge, which isn't used by
>>> any device, but decoded by the bridge, then that should be safe
>>> enough. Maybe could even have an option in upstream to do that on
>>> unplug, if a certain flag is set, or a cmdline option.
>>> -Daniel
>>
>>
>> Question - Why can't we just set those PTEs to point to system memory 
>> (another RO dummy page)
>> filled with 1s ?
>
>
> Then writes are not discarded. E.g. the 1s would change to something else.
>
> Christian.

I see but, what about marking the mappings as RO and discarding the write access 
page faults continuously until the device is finalized ?
Regarding using an unused range behind the upper bridge as Daniel suggested, I 
wonder will this interfere with
the upcoming feature to support BARs movement  during hot plug - 
https://www.spinics.net/lists/linux-pci/msg103195.html ?

Andrey

>
>
>>
>> Andrey
>>
>>
>>>
>>>> Christian.
>>>>
>>>>> It's a nifty idea indeed otherwise ...
>>>>> -Daniel
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>>>>>> But ugh ...
>>>>>>>>>>>
>>>>>>>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>>>>>>>> against 0xff reads is practically impossible. So maybe you need to add
>>>>>>>>>>> this as one of the tasks here?
>>>>>>>>>> Or I could just for validation purposes return ~0 from all reg reads 
>>>>>>>>>> in the code
>>>>>>>>>> and ignore writes if drm_dev_unplugged, this could already easily 
>>>>>>>>>> validate a big
>>>>>>>>>> portion of the code flow under such scenario.
>>>>>>>>> Hm yeah if your really wrap them all, that should work too. Since
>>>>>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>>>>>>>> warning free, should be doable to guarantee this.
>>>>>>>> Problem is that ~0 is not always a valid register value.
>>>>>>>>
>>>>>>>> You would need to audit every register read that it doesn't use the 
>>>>>>>> returned
>>>>>>>> value blindly as index or similar. That is quite a bit of work.
>>>>>>> Yeah that's the entire crux here :-/
>>>>>>> -Daniel
>>>