How to gracefully handle pci remove

Deucher, Alexander Alexander.Deucher at amd.com
Wed Aug 29 18:52:18 UTC 2018


Take a look at what the udl drm driver does.  It's a usb display chip.


Alex

________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Andrey Grodzovsky <Andrey.Grodzovsky at amd.com>
Sent: Wednesday, August 29, 2018 2:28:42 PM
To: Daniel Vetter; Noralf Trønnes
Cc: Dave Airlie; amd-gfx at lists.freedesktop.org; ML dri-devel; Koenig, Christian
Subject: Re: How to gracefully handle pci remove

Actually, I've just spotted this drm_dev_unplug, does it make sense to
use it in our pci_driver.remove hook

instead of explicitly doing drm_dev_unregister and drm_dev_put(dev) ?

This way at least any following IOCTL will fail with ENODEV.

Andrey


On 08/29/2018 11:07 AM, Daniel Vetter wrote:
> On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
> <Andrey.Grodzovsky at amd.com> wrote:
>> Just another ping...
>>
>> Daniel, Dave - maybe you could give some advise on that ?
>>
>> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
>> get some reference point, but it just hanged.
> drm_device hot-unplug is defacto unsolved. We've only just started to
> fix the most obvious races around the refcounting of drm_device
> it'self, see the work from Noralf Tronnes around drm_dev_get/put.
>
> No one has even started to think about what it would take to correctly
> refcount a full-blown memory manager to handle hotunplug. I'd expect
> lots of nightmares. The real horror is that it's not just the
> drm_device, but also lots of things we're exporting: dma_buf,
> dma_fence, ... All of that must be handled one way or the other.
>
> So expect your kernel to Oops when you unplug a device.
>
> Wrt userspace handling this: Probably an even bigger question. No
> idea, and will depend upon what userspace you're running.
> -Daniel
>
>> Andrey
>>
>>
>>
>>
>> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:
>>> Hi everybody , I am trying to resolve various problems I observe when
>>> logically removing AMDGPU device from pci - echo 1 >
>>> /sys/class/drm/card0/device/remove
>>>
>>> One of the problems I encountered was hitting WARNs  in
>>> amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
>>> allocations which is obvious since
>>>
>>> we didn't let user space clients know about the device removal and hence
>>> they won't release allocations and won't close their FDs.
>>>
>>> Question - how other drivers handle this use case, especially eGPUs since
>>> they indeed may be extracted in any moment, is there any way to notify Xorg
>>> and other clients about this so they may
>>>
>>> have a chance to release all their allocations and probably terminate ?
>>> Maybe some kind of uevent ?
>>>
>>> Andrey
>>>
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180829/816812b1/attachment-0001.html>


More information about the dri-devel mailing list