[PATCH 0/6] RFC Support hot device unplug in amdgpu

Lukas Wunner lukas at wunner.de
Mon May 11 14:08:40 UTC 2020


On Mon, May 11, 2020 at 02:21:57PM +0200, Daniel Vetter wrote:
> On Mon, May 11, 2020 at 1:43 PM Lukas Wunner <lukas at wunner.de> wrote:
> > On Mon, May 11, 2020 at 11:54:33AM +0200, Daniel Vetter wrote:
> > > - One unfortunate thing with drm_dev_unplug is that the driver core is
> > >   very opinionated and doesn't tell you whether it's a hotunplug or a
> > >   driver unload. In the former case trying to shut down hw just wastes
> > >   time (and might hit driver bugs), in the latter case driver engineers
> > >   very much expect everything to be shut down.
> > 
> > You can get that information at the PCI bus level with
> > pci_dev_is_disconnected().
> 
> Ok, so at least for pci devices you could do something like
> 
> if (pci_dev_is_disconnected())
>     drm_dev_unplug();
> else
>     drm_dev_unregister();
>
> In the ->remove callback and both users and developers should be
> happy.

Basically yes.  But if the driver is unbound e.g. via sysfs and the
device is hot-removed while it is being unbound, that approach fails.

So you'll need checks for pci_dev_is_disconnected() further below in
the call stack as well to avoid unpleasant side effects such as unduly
delaying unbinding or ending up in infinite loops when reading "all ones"
from PCI BARs, etc.

It may also be worth checking for pci_dev_is_disconnected() in ioctls
as well and directly returning -ENODEV, though of course that suffers
from the same race.  (The device may disappear after the check for
pci_dev_is_disconnected(), or it may have already disappeared but
pciehp hasn't updated the device's channel state yet.)

Thanks,

Lukas


More information about the amd-gfx mailing list