[PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

Thu Dec 17 20:38:32 UTC 2020

On 12/17/20 3:10 PM, Christian König wrote:
> [SNIP]
>>>> By eliminating such users, and replacing them with local maps which
>>>>> are strictly bound in how long they can exist (and hence we can
>>>>> serialize against them finishing in our hotunplug code).
>>>> Not sure I see how serializing against BO map/unmap helps - our problem as
>>>> you described is that once
>>>> device is extracted and then something else quickly takes it's place in the
>>>> PCI topology
>>>> and gets assigned same physical IO ranges, then our driver will start 
>>>> accessing this
>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>> Until your driver's remove callback is finished the ranges stay reserved.
>>
>>
>> The ranges stay reserved until unmapped which happens in bo->destroy
>
> I'm not sure of that. Why do you think that?

Because of this sequence 
ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
Is there another place I am missing ?

>
>> which for most internally allocated buffers is during sw_fini when last drm_put
>> is called.
>>
>>
>>> If that's not the case, then hotunplug would be fundamentally impossible
>>> ot handle correctly.
>>>
>>> Of course all the mmio actions will time out, so it might take some time
>>> to get through it all.
>>
>>
>> I found that PCI code provides pci_device_is_present function
>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is 
>> returned
>> or not. We can call it from within register accessors before trying read/write
>
> That's way to much overhead! We need to keep that much lower or it will result 
> in quite a performance drop.
>
> I suggest to rather think about adding drm_dev_enter/exit guards.

Sure, this one is just a bit upstream to the disconnect event. Eventually none 
of them is watertight.

Andrey

>
> Christian.
>
>>
>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>> very long lived, take for example the HW command
>>>> ring buffer Christian mentioned before -
>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>> is basically for the entire time the device exists, it's destroyed only in
>>>> the SW fini stage (when last drm_dev
>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>> amdgpu_pci_remove code and wait
>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>> take unbound time and that why I don't understand
>>>> how serializing will help.
>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>> to touch that ringbuffer bo anymore from the kernel.
>>
>>
>> I would assume we are not allowed to touch it once we identified the device is
>> gone in order to minimize the chance of accidental writes to some other 
>> device which might now
>> occupy those IO ranges ?
>>
>>
>>>   That's what
>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>> references to disappear.
>>
>>
>> Yes, didn't make sense to me why would we use vmap_local for internally
>> allocated buffers. I think we should also guard registers read/writes for the
>> same reason as above.
>>
>>
>>>
>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>> generic helpers).
>>> -Daniel
>>
>>
>> Ok, so I assumed that with vmap_local you were trying to solve the problem of 
>> quick reinsertion
>> of another device into same MMIO range that my driver still points too but 
>> actually are you trying to solve
>> the issue of exported dma buffers outliving the device ? For this we have 
>> drm_device refcount in the GEM layer
>> i think.
>>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>>> It doesn't
>>>>> solve all your problems, but it's a tool to get there.
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&reserved=0 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>