[PATCH v2] drm/i915/hwmon: Fix locking inversion in sysfs getter
Dixit, Ashutosh
ashutosh.dixit at intel.com
Tue Mar 12 16:25:14 UTC 2024
On Mon, 11 Mar 2024 13:34:58 -0700, Janusz Krzysztofik wrote:
>
> In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an
> rpm wakeref. That results in lock inversion:
>
> <4> [197.079335] ======================================================
> <4> [197.085473] WARNING: possible circular locking dependency detected
> <4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted
> <4> [197.098096] ------------------------------------------------------
> <4> [197.104231] prometheus-node/839 is trying to acquire lock:
> <4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350
> <4> [197.116939]
> but task is already holding lock:
> <4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915]
> <4> [197.131543]
> which lock already depends on the new lock.
> ...
> <4> [197.507922] Chain exists of:
> fs_reclaim --> >->reset.mutex --> &hwmon->hwmon_lock
> <4> [197.518528] Possible unsafe locking scenario:
> <4> [197.524411] CPU0 CPU1
> <4> [197.528916] ---- ----
> <4> [197.533418] lock(&hwmon->hwmon_lock);
> <4> [197.537237] lock(>->reset.mutex);
> <4> [197.543376] lock(&hwmon->hwmon_lock);
> <4> [197.549682] lock(fs_reclaim);
> ...
> <4> [197.632548] Call Trace:
> <4> [197.634990] <TASK>
> <4> [197.637088] dump_stack_lvl+0x64/0xb0
> <4> [197.640738] check_noncircular+0x15e/0x180
> <4> [197.652968] check_prev_add+0xe9/0xce0
> <4> [197.656705] __lock_acquire+0x179f/0x2300
> <4> [197.660694] lock_acquire+0xd8/0x2d0
> <4> [197.673009] fs_reclaim_acquire+0xa1/0xd0
> <4> [197.680478] __kmalloc+0x9a/0x350
> <4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0
> <4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0
> <4> [197.720608] acpi_ns_get_node+0x3b/0x60
> <4> [197.724428] acpi_get_handle+0x57/0xb0
> <4> [197.728164] acpi_has_method+0x20/0x50
> <4> [197.731896] acpi_pci_set_power_state+0x43/0x120
> <4> [197.736485] pci_power_up+0x24/0x1c0
> <4> [197.740047] pci_pm_default_resume_early+0x9/0x30
> <4> [197.744725] pci_pm_runtime_resume+0x2d/0x90
> <4> [197.753911] __rpm_callback+0x3c/0x110
> <4> [197.762586] rpm_callback+0x58/0x70
> <4> [197.766064] rpm_resume+0x51e/0x730
> <4> [197.769542] rpm_resume+0x267/0x730
> <4> [197.773020] rpm_resume+0x267/0x730
> <4> [197.776498] rpm_resume+0x267/0x730
> <4> [197.779974] __pm_runtime_resume+0x49/0x90
> <4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915]
> <4> [197.789070] hwm_energy+0x55/0x100 [i915]
> <4> [197.793183] hwm_read+0x9a/0x310 [i915]
> <4> [197.797124] hwmon_attr_show+0x36/0x120
> <4> [197.800946] dev_attr_show+0x15/0x60
> <4> [197.804509] sysfs_kf_seq_show+0xb5/0x100
>
> Acquire the wakeref before the lock and hold it as long as the lock is
> also held. Follow that pattern across the whole source file where similar
> lock inversion can happen.
>
> v2: Keep hardware read under the lock so the whole operation of updating
> energy from hardware is still atomic (Guenter),
> - instead, acquire the rpm wakeref before the lock and hold it as long
> as the lock is held,
> - use the same aproach for other similar places across the i915_hwmon.c
> source file (Rodrigo).
>
> Fixes: c41b8bdcc297 ("drm/i915/hwmon: Show device level energy usage")
I would think that the lock inversion issue was introduced here:
1b44019a93e2 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware")
This is the commit which introduced this sequence:
lock(>->reset.mutex);
lock(&hwmon->hwmon_lock);
Before this, everything was fine. So perhaps the Fixes tag should reference
this commit?
Otherwise the patch LGTM:
Reviewed-by: Ashutosh Dixit <ashutosh.dixit at intel.com>
More information about the Intel-gfx
mailing list