[Intel-gfx] [PATCH] ALSA: hda: fix general protection fault in azx_runtime_idle

Takashi Iwai tiwai at suse.de
Thu Nov 11 13:29:33 UTC 2021


On Wed, 10 Nov 2021 23:15:40 +0100,
Kai Vehmanen wrote:
> 
> Hey,
> 
> On Wed, 10 Nov 2021, Takashi Iwai wrote:
> 
> > On Wed, 10 Nov 2021 22:03:07 +0100, Kai Vehmanen wrote:
> > > Fix a corner case between PCI device driver remove callback and
> > > runtime PM idle callback.
> [...]
> > > Some non-persistent direct links showing the bug trigger on
> > > different platforms with linux-next 20211109:
> > >  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-tgl-1115g4/igt@i915_module_load@reload.html
> > >  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-jsl-1/igt@i915_module_load@reload.html
> > > 
> > > Notably with 20211110 linux-next, the bug does not trigger:
> > >  - https://intel-gfx-ci.01.org/tree/linux-next/next-20211110/fi-tgl-1115g4/igt@i915_module_load@reload.html
> > 
> > Is this the case with CONFIG_DEBUG_KOBJECT_RELEASE?
> > This would be the only logical explanation I can think of for now.
> 
> hmm, that doesn't seem to be used. Here's a link to kconfig used in the 
> failing CI run:
> https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/kconfig.txt

OK, then it's not due to the delayed release, but the cause should be
the same, I suppose.

> It's still a bit odd, especially given Scott just reported the other HDA 
> related regression in 5.15 today. The two issues don't seem to be related 
> though, although both are fixed by clearing drvdata (but in different 
> places of hda_intel.c).

I don't think it's the same issue, rather a coincidence of the
timing.  There have been many changes in 5.15, after all :)

> I'll try to run some more tests tomorrow. The fix should be good in any 
> case, but it would be interesting to understand better what change made 
> this more (?) likely to hit than before. This is not a new test and the 
> problem happens on fairly old platforms, so something has changed.

A potential problem with the current code is that it doesn't disable
the runtime PM at the release procedure.  Could you try the patch
below?  You can put WARN_ON(!chip) at azx_runtime_idle(), too, for
catching the invalid runtime call.


thanks,

Takashi

--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1347,8 +1347,13 @@ static void azx_free(struct azx *chip)
 	if (hda->freed)
 		return;
 
-	if (azx_has_pm_runtime(chip) && chip->running)
+	if (azx_has_pm_runtime(chip) && chip->running) {
 		pm_runtime_get_noresume(&pci->dev);
+		pm_runtime_forbid(&pci->dev);
+		pm_runtime_dont_use_autosuspend(&pci->dev);
+		pm_runtime_disable(&pci->dev);
+	}
+
 	chip->running = 0;
 
 	azx_del_card_list(chip);
@@ -2320,6 +2325,7 @@ static int azx_probe_continue(struct azx *chip)
 	set_default_power_save(chip);
 
 	if (azx_has_pm_runtime(chip)) {
+		pm_runtime_enable(&pci->dev);
 		pm_runtime_use_autosuspend(&pci->dev);
 		pm_runtime_allow(&pci->dev);
 		pm_runtime_put_autosuspend(&pci->dev);


More information about the Intel-gfx mailing list