[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal
Christian König
ckoenig.leichtzumerken at gmail.com
Tue Feb 9 08:27:03 UTC 2021
Am 08.02.21 um 23:09 schrieb Andrey Grodzovsky:
>
>
> On 2/8/21 4:37 AM, Christian König wrote:
>> Am 07.02.21 um 22:50 schrieb Daniel Vetter:
>>> [SNIP]
>>>> Clarification - as far as I know there are no page fault handlers
>>>> for kernel
>>>> mappings. And we are talking about kernel mappings here, right ?
>>>> If there were
>>>> I could solve all those issues the same as I do for user mappings, by
>>>> invalidating all existing mappings in the kernel (both kmaps and
>>>> ioreamps)and
>>>> insert dummy zero or ~0 filled page instead.
>>>> Also, I assume forcefully remapping the IO BAR to ~0 filled page
>>>> would involve
>>>> ioremap API and it's not something that I think can be easily done
>>>> according to
>>>> am answer i got to a related topic a few weeks ago
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Flinux-pci%2Fmsg103396.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb159d3ce264944486c8008d8cc15233a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483738446813868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6eP0nhS%2BZwp1Y54CwfX8vaV3FTWbW8IylW5JFaf92pY%3D&reserved=0
>>>> (that was the only reply
>>>> i got)
>>> mmiotrace can, but only for debug, and only on x86 platforms:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Ftrace%2Fmmiotrace.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb159d3ce264944486c8008d8cc15233a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637483738446813868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QBF9J%2BVIRUkUTTjvNoZR8NqFNt8CpHkcknH2qKX7dd8%3D&reserved=0
>>>
>>>
>>> Should be feasible (but maybe not worth the effort) to extend this to
>>> support fake unplug.
>>
>> Mhm, interesting idea you guys brought up here.
>>
>> We don't need a page fault for this to work, all we need to do is to
>> insert dummy PTEs into the kernels page table at the place where
>> previously the MMIO mapping has been.
>
>
> But that exactly what Mathew from linux-mm says is not a trivial thing
> to do, quote:
>
> "
>
> ioremap() is done through the vmalloc space. It would, in theory, be
> possible to reprogram the page tables used for vmalloc to point to your
> magic page. I don't think we have such a mechanism today, and there are
> lots of problems with things like TLB flushes. It's probably going to
> be harder than you think.
> "
I haven't followed the full discussion, but I don't see much preventing
this.
All you need is a new ioremap_dummy() function which takes the old start
and length of the mapping.
Still a bit core and maybe even platform code, but rather useful I think.
Christian.
>
> If you believe it's actually doable then it would be useful not only for simulating device
> unplugged situation with all MMIOs returning 0xff... but for actual handling of driver accesses
> to MMIO after device is gone and, we could then drop entirely this patch as there would be no need
> to guard against such accesses post device unplug.
>
>
>>
>>>>> But ugh ...
>>>>>
>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>> against 0xff reads is practically impossible. So maybe you need to
>>>>> add
>>>>> this as one of the tasks here?
>>>> Or I could just for validation purposes return ~0 from all reg
>>>> reads in the code
>>>> and ignore writes if drm_dev_unplugged, this could already easily
>>>> validate a big
>>>> portion of the code flow under such scenario.
>>> Hm yeah if your really wrap them all, that should work too. Since
>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>> warning free, should be doable to guarantee this.
>>
>> Problem is that ~0 is not always a valid register value.
>>
>> You would need to audit every register read that it doesn't use the
>> returned value blindly as index or similar. That is quite a bit of work.
>
>
> But ~0 is the value that will be returned for every read post device
> unplug, regardless if it's valid or not, and we have to cope with
> it then, no ?
>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210209/d955d351/attachment-0001.htm>
More information about the amd-gfx
mailing list