[Nouveau] nouveau regression [bisected] hotplug broken on gf108 since 4.1
Hans de Goede
hdegoede at redhat.com
Tue Nov 26 15:26:04 UTC 2019
Hi All,
I'm having this really weird issue with broken hotplug on a Dell Latitude E6430 with
hybrid gfx, with a gf108 as discrete GPU.
The LCD panel and VGA are connected through a mux to the iGPU, but the HDMI is only
connected to the nvidia dGPU. Plugging in a HDMI monitor when the dGPU is suspend
works fine, the ACPI event fires and everything works as it should.
Unplugging the HDMI OTOH, or plugging it in when the dGPU is _not_ suspended does
not work until I manually run xrandr.
I've bisected this and the bisect points to:
cfea88a4d866 ("drm/nouveau: Start using new drm_dev initialization helpers"),
which landed in 4.20
Which functionally makes no changes other then some subtle changes to the
initialization order. I've been poking things surrounding this the entire
day but no luck.
One thing which I did found out, is that I can break the commit before the
troublesome commit to behave in the exact same way by commenting out the
drm_sysfs_hotplug_event() call from drm_sysfs_connector_add().
One thing to note here is that drm_register_connector has the following at
the top:
if (!connector->dev->registered)
return 0;
Before the troublesome commit, when we still had a load callback, this
check would not be hit when nouveau_drm_load() (now nouveau_drm_device_init()
ran as drm_dev_register does:
dev->registered = true;
if (dev->driver->load) {
ret = dev->driver->load(dev, flags);
if (ret)
goto err_minors;
}
So would register the connectors directly from nouveau_drm_load()/nouveau_drm_device_init()
(through the drm_register_connector() call at the end of nouveau_connector_create).
Since the troublesome commit we do hit the if in question since we now call
nouveau_drm_device_init() before drm_dev_register(), turning the drm_register_connector()
call at the end of nouveau_connector_create in a no-op for non dp-mst connectors.
The connectors do get registered a bit later when drm_dev_register() calls
drm_modeset_register_all()
This subtle change in ordering is why I started poking at drm_connector_register
and the functions it calls and how I found out that commenting out the
drm_sysfs_hotplug_event() call reproduces the behavior on older working kernels.
I have tried replacing this with a sleep call to rule out timing issues, that
does not help.
Other things I have checked is that one commit before the troublesome commit
nouveau_connector_hotplug() properly runs; and that after the troublesome
commit we still properly call nvif_notify_get(&conn->hpd);
So now I'm all out of ideas how to debug this further and I hope someone on
the list has an idea how to debug this.
Regards,
Hans
p.s.
I've also tried a little hack like this:
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 2b2baf6e0e0d..efc7ba666b1b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -459,7 +459,7 @@ nouveau_accel_init(struct nouveau_drm *drm)
}
static int
-nouveau_drm_device_init(struct drm_device *dev)
+nouveau_drm_load(struct drm_device *dev, unsigned long flags)
{
struct nouveau_drm *drm;
int ret;
@@ -647,9 +647,9 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
drm_dev->pdev = pdev;
pci_set_drvdata(pdev, drm_dev);
- ret = nouveau_drm_device_init(drm_dev);
+/* ret = nouveau_drm_device_init(drm_dev);
if (ret)
- goto fail_pci;
+ goto fail_pci; */
ret = drm_dev_register(drm_dev, pent->driver_data);
if (ret)
@@ -1051,6 +1051,7 @@ driver_stub = {
DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER |
DRIVER_KMS_LEGACY_CONTEXT,
+ .load = nouveau_drm_load,
.open = nouveau_drm_open,
.postclose = nouveau_drm_postclose,
.lastclose = nouveau_vga_lastclose,
And with 4.20 that fixes things, on 5.4 I get an oops with this hack
which I've not debugged further since this is not a proper solution.
More information about the Nouveau
mailing list