[PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Fri Feb 12 15:00:05 UTC 2021


Ping

Andrey

On 2/10/21 5:01 PM, Andrey Grodzovsky wrote:
>
> On 2/9/21 10:40 AM, Christian König wrote:
>> Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>> Question - Why can't we just set those PTEs to point to system memory 
>>>>> (another RO dummy page)
>>>>> filled with 1s ?
>>>>
>>>>
>>>> Then writes are not discarded. E.g. the 1s would change to something else.
>>>>
>>>> Christian.
>>>
>>>
>>> I see but, what about marking the mappings as RO and discarding the write 
>>> access page faults continuously until the device is finalized ?
>>> Regarding using an unused range behind the upper bridge as Daniel suggested, 
>>> I wonder will this interfere with
>>> the upcoming feature to support BARs movement  during hot plug - 
>>> https://www.spinics.net/lists/linux-pci/msg103195.html ?
>>
>> In the picture in my head the address will never be used.
>>
>> But it doesn't even needs to be an unused range of the root bridge. That one 
>> is usually stuffed full by the BIOS.
>>
>> For my BAR resize work I looked at quite a bunch of NB documentation. At 
>> least for AMD CPUs we should always have an MMIO address which is never ever 
>> used. That makes this platform/CPU dependent, but the code is just minimal.
>>
>> The really really nice thing about this approach is that you could unit test 
>> and audit the code for problems even without *real* hotplug hardware. E.g. we 
>> can swap the kernel PTEs and see how the whole power and display code reacts 
>> to that etc....
>>
>> Christian.
>
>
> Tried to play with hacking mmio tracer as Daniel suggested but just hanged the 
> machine so...
>
> Can you tell me how to dynamically obtain this kind of unused MMIO address ? I 
> think given such address, writing
> to which is dropped and reading from return all 1s, I can then do something 
> like bellow, if that what u meant -
>
> for (address = adev->rmmio; address < adev->rmmio_size; adress += PAGE_SIZE) {
>
>     old_pte = get_locked_pte(init_mm, address)
>     dummy_pte = pfn_pte(fake_mmio_address, 0);
>     set_pte(&old_pte, dummy_pte)
>
> }
>
> flush_tlb ???
>
> P.S I hope to obtain thunderbolt eGPU adapter soon so even if this won't work 
> I still will be able to test how the driver handles all 1s.
>
> Andrey
>
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> It's a nifty idea indeed otherwise ...
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> But ugh ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>>>>>>>>>>> against 0xff reads is practically impossible. So maybe you need 
>>>>>>>>>>>>>> to add
>>>>>>>>>>>>>> this as one of the tasks here?
>>>>>>>>>>>>> Or I could just for validation purposes return ~0 from all reg 
>>>>>>>>>>>>> reads in the code
>>>>>>>>>>>>> and ignore writes if drm_dev_unplugged, this could already easily 
>>>>>>>>>>>>> validate a big
>>>>>>>>>>>>> portion of the code flow under such scenario.
>>>>>>>>>>>> Hm yeah if your really wrap them all, that should work too. Since
>>>>>>>>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>>>>>>>>>>> warning free, should be doable to guarantee this.
>>>>>>>>>>> Problem is that ~0 is not always a valid register value.
>>>>>>>>>>>
>>>>>>>>>>> You would need to audit every register read that it doesn't use the 
>>>>>>>>>>> returned
>>>>>>>>>>> value blindly as index or similar. That is quite a bit of work.
>>>>>>>>>> Yeah that's the entire crux here :-/
>>>>>>>>>> -Daniel
>>>>>>
>>


More information about the amd-gfx mailing list