[Intel-gfx] [PATCH] ALSA: hda: fix general protection fault in azx_runtime_idle

Takashi Iwai tiwai at suse.de
Wed Nov 10 21:55:23 UTC 2021


On Wed, 10 Nov 2021 22:03:07 +0100,
Kai Vehmanen wrote:
> 
> Fix a corner case between PCI device driver remove callback and
> runtime PM idle callback.
> 
> Following sequence of events can happen:
>   - at azx_create, context is allocated with devm_kzalloc() and
>     stored as pci_set_drvdata()
>   - user-space requests to unbind audio driver
>   - dd.c:__device_release_driver() calls PCI remove
>   - pci-driver.c:pci_device_remove() calls the audio
>     driver azx_remove() callback and this is completed
>   - pci-driver.c:pm_runtime_put_sync() leads to a call
>     to rpm_idle() which again calls azx_runtime_idle()
>   - the azx context object, as returned by dev_get_drvdata(),
>     is no longer valid
>   -> access fault in azx_runtime_idle when executing
> 	struct snd_card *card = dev_get_drvdata(dev);
> 	chip = card->private_data;
> 	if (chip->disabled || hda->init_failed)
> 
> This was discovered by i915_module_load test with 5.15.0 based
> linux-next tree.
> 
> Example log caught by i915_module_load test with linux-next
> https://intel-gfx-ci.01.org/tree/linux-next/
> 
> <4> [264.038232] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b73f0: 0000 [#1] PREEMPT SMP NOPTI
> <4> [264.038248] CPU: 0 PID: 5374 Comm: i915_module_loa Not tainted 5.15.0-next-20211109-gc8109c2ba35e-next-20211109 #1
> [...]
> <4> [264.038267] RIP: 0010:azx_runtime_idle+0x12/0x60 [snd_hda_intel]
> [...]
> <4> [264.038355] Call Trace:
> <4> [264.038359]  <TASK>
> <4> [264.038362]  __rpm_callback+0x3d/0x110
> <4> [264.038371]  rpm_idle+0x27f/0x380
> <4> [264.038376]  __pm_runtime_idle+0x3b/0x100
> <4> [264.038382]  pci_device_remove+0x6d/0xa0
> <4> [264.038388]  device_release_driver_internal+0xef/0x1e0
> <4> [264.038395]  unbind_store+0xeb/0x120
> <4> [264.038400]  kernfs_fop_write_iter+0x11a/0x1c0
> 
> Fix the issue by setting drvdata to NULL at end of azx_remove().
> 
> Signed-off-by: Kai Vehmanen <kai.vehmanen at linux.intel.com>
> ---
>  sound/pci/hda/hda_intel.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> Some non-persistent direct links showing the bug trigger on
> different platforms with linux-next 20211109:
>  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-tgl-1115g4/igt@i915_module_load@reload.html
>  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-jsl-1/igt@i915_module_load@reload.html
> 
> Notably with 20211110 linux-next, the bug does not trigger:
>  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211110/fi-tgl-1115g4/igt@i915_module_load@reload.html

Is this the case with CONFIG_DEBUG_KOBJECT_RELEASE?
This would be the only logical explanation I can think of for now.

In anyway, the code change itself looks good, so I took the fix now.


thanks,

Takashi


More information about the Intel-gfx mailing list