How to gracefully handle pci remove

Daniel Vetter daniel at ffwll.ch
Wed Aug 29 15:07:34 UTC 2018


On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
<Andrey.Grodzovsky at amd.com> wrote:
> Just another ping...
>
> Daniel, Dave - maybe you could give some advise on that ?
>
> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
> get some reference point, but it just hanged.

drm_device hot-unplug is defacto unsolved. We've only just started to
fix the most obvious races around the refcounting of drm_device
it'self, see the work from Noralf Tronnes around drm_dev_get/put.

No one has even started to think about what it would take to correctly
refcount a full-blown memory manager to handle hotunplug. I'd expect
lots of nightmares. The real horror is that it's not just the
drm_device, but also lots of things we're exporting: dma_buf,
dma_fence, ... All of that must be handled one way or the other.

So expect your kernel to Oops when you unplug a device.

Wrt userspace handling this: Probably an even bigger question. No
idea, and will depend upon what userspace you're running.
-Daniel

>
> Andrey
>
>
>
>
> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:
>>
>> Hi everybody , I am trying to resolve various problems I observe when
>> logically removing AMDGPU device from pci - echo 1 >
>> /sys/class/drm/card0/device/remove
>>
>> One of the problems I encountered was hitting WARNs  in
>> amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
>> allocations which is obvious since
>>
>> we didn't let user space clients know about the device removal and hence
>> they won't release allocations and won't close their FDs.
>>
>> Question - how other drivers handle this use case, especially eGPUs since
>> they indeed may be extracted in any moment, is there any way to notify Xorg
>> and other clients about this so they may
>>
>> have a chance to release all their allocations and probably terminate ?
>> Maybe some kind of uevent ?
>>
>> Andrey
>>
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the amd-gfx mailing list