[PATCH v2] drm/i915: Fix harmfull driver register/unregister assymetry
Sebastian Brzezinka
sebastian.brzezinka at intel.com
Thu Feb 20 16:10:57 UTC 2025
Hi Janusz
On Wed Feb 19, 2025 at 6:37 PM UTC, Janusz Krzysztofik wrote:
> Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> drm_dev_register() fails"), we return immediately from
> i915_driver_register() if drm_dev_register() fails, skipping remaining
> registration steps. However, the _unregister() counterpart called at
> device remove knows nothing about that skip and executes reverts of all
> those steps. As a consequence, a number of kernel warnings that taint the
> kernel are triggered:
>
> <3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for
> userspace access!
> ...
> <4> [525.831069] ------------[ cut here ]------------
> <4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakere
> f)
> <4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel
> _display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G U
> 6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
> ...
> <4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831483] Call Trace:
> <4> [525.831484] <TASK>
> ...
> <4> [525.831943] i915_driver_remove+0x4b/0x140 [i915]
> <4> [525.832028] i915_pci_remove+0x1e/0x40 [i915]
> <4> [525.832099] pci_device_remove+0x3e/0xb0
> <4> [525.832103] device_remove+0x40/0x80
> <4> [525.832107] device_release_driver_internal+0x215/0x280
> ...
> <4> [525.947666] ------------[ cut here ]------------
> <4> [525.947669] kobject: '(null)' (ffff88814f62a218): is not initialized, yet kobject_put() is being called.
> <4> [525.947707] WARNING: CPU: 6 PID: 3440 at lib/kobject.c:734 kobject_put+0xe4/0x200
> ...
> <4> [525.947875] RIP: 0010:kobject_put+0xe4/0x200
> ...
> <4> [525.947909] Call Trace:
> <4> [525.947911] <TASK>
> ...
> <4> [525.947963] intel_gt_sysfs_unregister+0x25/0x40 [i915]
> <4> [525.948133] intel_gt_driver_unregister+0x14/0x80 [i915]
> <4> [525.948291] i915_driver_remove+0x6c/0x140 [i915]
> <4> [525.948411] i915_pci_remove+0x1e/0x40 [i915]
> ...
> <4> [526.441186] ------------[ cut here ]------------
> <4> [526.441191] kernfs: can not remove 'error', no directory
> <4> [526.441211] WARNING: CPU: 1 PID: 3440 at fs/kernfs/dir.c:1684 kernfs_remove_by_name_ns+0xbc/0xc0
> ...
> <4> [526.441536] RIP: 0010:kernfs_remove_by_name_ns+0xbc/0xc0
> ...
> <4> [526.441578] Call Trace:
> <4> [526.441581] <TASK>
> ...
> <4> [526.441686] sysfs_remove_bin_file+0x17/0x30
> <4> [526.441691] i915_gpu_error_sysfs_teardown+0x1d/0x30 [i915]
> <4> [526.442226] i915_teardown_sysfs+0x1c/0x60 [i915]
> <4> [526.442369] i915_driver_remove+0x9d/0x140 [i915]
> <4> [526.442473] i915_pci_remove+0x1e/0x40 [i915]
> ...
> <4> [526.685700] ------------[ cut here ]------------
> <4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cle
> anup
> <4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime
> _pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686294] Call Trace:
> <4> [526.686296] <TASK>
> ...
> <4> [526.687025] i915_driver_release+0x7e/0xb0 [i915]
> <4> [526.687243] drm_dev_put.part.0+0x47/0x90
> <4> [526.687250] devm_drm_dev_init_release+0x13/0x30
> <4> [526.687255] devm_action_release+0x12/0x30
> <4> [526.687261] release_nodes+0x3a/0x120
> <4> [526.687268] devres_release_all+0x97/0xe0
> <4> [526.687277] device_unbind_cleanup+0x12/0x80
> <4> [526.687282] device_release_driver_internal+0x23a/0x280
> ...
>
> Introduce a flag that indicates device registration status, raise it on
> device registration success. When that flag is found not set while
> unregistering the driver, jump over reverts of registration steps omitted
> on device registration failure.
>
> To make it possible, move i915_pmu_unregister(), a counterpart of
> i915_pmu_unregister() that is called unconditionally before registering
> the device, down so it is not skipped in any case. Also, move
> intel_pxp_fini(), that apparently needs to be called on driver remove
> whether the device was registered successfully or not, down right below
> the jump target (though it doesn't look like a step that belongs to driver
> unregistration, but that's beyond the scope of this patch).
>
> v2: Check in _unregister whether the drm_dev_register has succeeded and
> skip some of the _unregister() steps. (Andi)
>
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9820
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> Cc: Andi Shyti <andi.shyti at linux.intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_driver.c | 13 ++++++++++---
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 91a7748f44926..d3b30c3690bbe 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -639,6 +639,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> return;
> }
>
> + dev_priv->registered = true;
> +
> i915_debugfs_register(dev_priv);
> i915_setup_sysfs(dev_priv);
>
> @@ -673,6 +675,9 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> struct intel_gt *gt;
> unsigned int i;
>
> + if (!dev_priv->registered)
> + goto not_registered;
> +
> i915_switcheroo_unregister(dev_priv);
>
> intel_unregister_dsm_handler();
> @@ -682,17 +687,19 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>
> intel_display_driver_unregister(display);
>
> - intel_pxp_fini(dev_priv);
> -
> for_each_gt(gt, dev_priv, i)
> intel_gt_driver_unregister(gt);
>
> i915_hwmon_unregister(dev_priv);
>
> i915_perf_unregister(dev_priv);
> - i915_pmu_unregister(dev_priv);
>
> i915_teardown_sysfs(dev_priv);
> +
> +not_registered:
> + intel_pxp_fini(dev_priv);
> + i915_pmu_unregister(dev_priv);
> +
> drm_dev_unplug(&dev_priv->drm);
>
> i915_gem_driver_unregister(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ffc346379cc2c..27a23b1eb9de0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -347,6 +347,8 @@ struct drm_i915_private {
> /* The TTM device structure. */
> struct ttm_device bdev;
>
> + bool registered;
Isn't `struct·drm_device` already has a registered field that could
be used, instead of introducing new variable. It's set in
`int·drm_dev_register(struct·drm_device·*dev,·unsigned·long·flags)`
--
Best regards,
Sebastian
More information about the dri-devel
mailing list