[PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind
Dixit, Ashutosh
ashutosh.dixit at intel.com
Wed Mar 27 19:37:29 UTC 2024
On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote:
>
Hi Badal,
> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:
I don't agree with this patch. For even if we remove setting to NULL, it
doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon
has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf):
(gdb) list *hwm_energy+0x2b
0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
129 struct hwm_energy_info *ei = &ddat->ei;
130 intel_wakeref_t wakeref;
131 i915_reg_t rgaddr;
132 u32 reg_val;
133
134 if (ddat->gt_n >= 0)
135 rgaddr = hwmon->rg.energy_status_tile;
136 else
137 rgaddr = hwmon->rg.energy_status_all;
138
If we don't have i915_hwmon_unregister equivalent in xe, that's because xe
has not implemented equivalents of i915_hwmon_power_max_disable and
i915_hwmon_power_max_restore, where the check is used.
Also, we should verify any fix before sending a patch out, either via local
repro (say use a script to read hwmon energy file while running the
selftests) or CI. Though in this case CI doesn't show any failures but I am
wondering if that is only by chance.
Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon
sysfs has disappeared, at module unload time (since selftests use module
unload), allowing prometheus-node process to call into the energy read
sysfs and cause the crash.
Thanks.
--
Ashutosh
>
> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U 6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
> <4> [518.600343] FS: 00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
> <4> [518.608373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
> <4> [518.621158] PKRU: 55555554
> <4> [518.623858] Call Trace:
> <4> [518.626303] <TASK>
> <4> [518.628400] ? __die_body+0x1a/0x60
> <4> [518.631881] ? die_addr+0x38/0x60
> <4> [518.635182] ? exc_general_protection+0x1a1/0x400
> <4> [518.639862] ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710] ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007] hwm_read+0x9a/0x310 [i915]
> <4> [518.652953] hwmon_attr_show+0x36/0x140
> <4> [518.656775] dev_attr_show+0x15/0x60
>
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
> Cc: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
> ---
> drivers/gpu/drm/i915/i915_driver.c | 2 --
> drivers/gpu/drm/i915/i915_hwmon.c | 5 -----
> 2 files changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> for_each_gt(gt, dev_priv, i)
> intel_gt_driver_unregister(gt);
>
> - i915_hwmon_unregister(dev_priv);
> -
> i915_perf_unregister(dev_priv);
> i915_pmu_unregister(dev_priv);
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
> ddat_gt->hwmon_dev = hwmon_dev;
> }
> }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> - fetch_and_zero(&i915->hwmon);
> -}
> --
> 2.25.1
>
More information about the Intel-gfx
mailing list