[PATCH 0/6] RFC Support hot device unplug in amdgpu
Daniel Vetter
daniel at ffwll.ch
Mon May 11 14:14:53 UTC 2020
On Mon, May 11, 2020 at 4:08 PM Lukas Wunner <lukas at wunner.de> wrote:
>
> On Mon, May 11, 2020 at 02:21:57PM +0200, Daniel Vetter wrote:
> > On Mon, May 11, 2020 at 1:43 PM Lukas Wunner <lukas at wunner.de> wrote:
> > > On Mon, May 11, 2020 at 11:54:33AM +0200, Daniel Vetter wrote:
> > > > - One unfortunate thing with drm_dev_unplug is that the driver core is
> > > > very opinionated and doesn't tell you whether it's a hotunplug or a
> > > > driver unload. In the former case trying to shut down hw just wastes
> > > > time (and might hit driver bugs), in the latter case driver engineers
> > > > very much expect everything to be shut down.
> > >
> > > You can get that information at the PCI bus level with
> > > pci_dev_is_disconnected().
> >
> > Ok, so at least for pci devices you could do something like
> >
> > if (pci_dev_is_disconnected())
> > drm_dev_unplug();
> > else
> > drm_dev_unregister();
> >
> > In the ->remove callback and both users and developers should be
> > happy.
>
> Basically yes. But if the driver is unbound e.g. via sysfs and the
> device is hot-removed while it is being unbound, that approach fails.
>
> So you'll need checks for pci_dev_is_disconnected() further below in
> the call stack as well to avoid unpleasant side effects such as unduly
> delaying unbinding or ending up in infinite loops when reading "all ones"
> from PCI BARs, etc.
>
> It may also be worth checking for pci_dev_is_disconnected() in ioctls
> as well and directly returning -ENODEV, though of course that suffers
> from the same race. (The device may disappear after the check for
> pci_dev_is_disconnected(), or it may have already disappeared but
> pciehp hasn't updated the device's channel state yet.)
I guess we could do a drm_pci_dev_enter which combines drm_dev_enter +
pci_dev_is_connected. Not perfect, but well then the only real
solution is just unconditionaly drm_dev_unplug in ->remove. I think if
we do an additional developer_mode module parameter, and if that's not
explicitly set, ignore the pci_dev_is_disconnected and just always
call drm_dev_unplug() that would be about as good as it gets.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the amd-gfx
mailing list