[Intel-gfx] [PATCH] ALSA: hda: fix general protection fault in azx_runtime_idle
Takashi Iwai
tiwai at suse.de
Thu Nov 11 13:29:33 UTC 2021
On Wed, 10 Nov 2021 23:15:40 +0100,
Kai Vehmanen wrote:
>
> Hey,
>
> On Wed, 10 Nov 2021, Takashi Iwai wrote:
>
> > On Wed, 10 Nov 2021 22:03:07 +0100, Kai Vehmanen wrote:
> > > Fix a corner case between PCI device driver remove callback and
> > > runtime PM idle callback.
> [...]
> > > Some non-persistent direct links showing the bug trigger on
> > > different platforms with linux-next 20211109:
> > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-tgl-1115g4/igt@i915_module_load@reload.html
> > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-jsl-1/igt@i915_module_load@reload.html
> > >
> > > Notably with 20211110 linux-next, the bug does not trigger:
> > > - https://intel-gfx-ci.01.org/tree/linux-next/next-20211110/fi-tgl-1115g4/igt@i915_module_load@reload.html
> >
> > Is this the case with CONFIG_DEBUG_KOBJECT_RELEASE?
> > This would be the only logical explanation I can think of for now.
>
> hmm, that doesn't seem to be used. Here's a link to kconfig used in the
> failing CI run:
> https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/kconfig.txt
OK, then it's not due to the delayed release, but the cause should be
the same, I suppose.
> It's still a bit odd, especially given Scott just reported the other HDA
> related regression in 5.15 today. The two issues don't seem to be related
> though, although both are fixed by clearing drvdata (but in different
> places of hda_intel.c).
I don't think it's the same issue, rather a coincidence of the
timing. There have been many changes in 5.15, after all :)
> I'll try to run some more tests tomorrow. The fix should be good in any
> case, but it would be interesting to understand better what change made
> this more (?) likely to hit than before. This is not a new test and the
> problem happens on fairly old platforms, so something has changed.
A potential problem with the current code is that it doesn't disable
the runtime PM at the release procedure. Could you try the patch
below? You can put WARN_ON(!chip) at azx_runtime_idle(), too, for
catching the invalid runtime call.
thanks,
Takashi
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1347,8 +1347,13 @@ static void azx_free(struct azx *chip)
if (hda->freed)
return;
- if (azx_has_pm_runtime(chip) && chip->running)
+ if (azx_has_pm_runtime(chip) && chip->running) {
pm_runtime_get_noresume(&pci->dev);
+ pm_runtime_forbid(&pci->dev);
+ pm_runtime_dont_use_autosuspend(&pci->dev);
+ pm_runtime_disable(&pci->dev);
+ }
+
chip->running = 0;
azx_del_card_list(chip);
@@ -2320,6 +2325,7 @@ static int azx_probe_continue(struct azx *chip)
set_default_power_save(chip);
if (azx_has_pm_runtime(chip)) {
+ pm_runtime_enable(&pci->dev);
pm_runtime_use_autosuspend(&pci->dev);
pm_runtime_allow(&pci->dev);
pm_runtime_put_autosuspend(&pci->dev);
More information about the Intel-gfx
mailing list