[Nouveau] [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
Peter Wu
peter at lekensteyn.nl
Mon May 30 17:03:46 UTC 2016
On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
> Hi Peter,
>
> On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote:
> > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
> > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
> > > We therefore leak a runtime pm ref whenever nouveau is loaded with
> > > runpm=0 and then unloaded. The GPU will subsequently never runtime
> > > suspend even if nouveau is loaded again with runpm=1.
> > >
> > > Fix by taking the runtime pm ref under the same condition that it was
> > > released on driver load.
> > >
> > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > Cc: Dave Airlie <airlied at redhat.com>
> > > Reported-by: Karol Herbst <nouveau at karolherbst.de>
> > > Tested-by: Karol Herbst <nouveau at karolherbst.de>
> > > Signed-off-by: Lukas Wunner <lukas at wunner.de>
> >
> > Looks good, I tested this scenario:
> >
> > ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;}
> > ru # reports 1
> > modprobe nouveau runpm=0
> > ru # reports 2
> > rmmod nouveau
> > ru # reports 1
> >
> > Without runpm=0 the count drops to 0 in the second step and stays 0 in
> > the third step. After applying patch 2/9, this correctly reports 1 as
> > expected (this is the same as manually setting power/control to on).
>
> How exactly did you reach the situation where the root port didn't wake
> up when you tried to load nouveau again? (IRC conversation this week.)
Ensure that the pci/pm patches are applied, then:
0. Unload nouveau (I have blacklisted it for testing).
1. Enable rpm for the root port and children (control = auto).
2. Verify in the kernel logs that the devices are sleeping:
pcieport 0000:00:01.0: power state changed by ACPI to D3cold
3. (Optional, to rule out issues with delays:) Disable rpm for the
Nvidia device (control = on).
4. modprobe nouveau.
The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:
50.245795 MXM: GUID detected in BIOS
50.245948 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
50.246044 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492
50.246110 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
50.246289 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
50.246457 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F
50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
50.247084 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D
50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0
50.491893 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
50.492285 pcieport 0000:00:01.0: PME# disabled
50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff)
50.492687 nouveau: probe of 0000:01:00.0 failed with error -12
50.501990 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._S0W] at AML address ffffc90000010a8e length 2
50.502403 pcieport 0000:00:01.0: PME# enabled
50.502601 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D
50.513005 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._OFF] at AML address ffffc90000010994 length 6D
50.533258 pcieport 0000:00:01.0: power state changed by ACPI to D3cold
(Note that this patch is not included.) When nouveau is operating
normally, I see that _PS0 is also called (which does not happen above).
If you think that mixing power resources with DSM causes this issue, I
also tried to apply my power resources work for nouveau but it gives the
same problem:
20.183306 MXM: GUID detected in BIOS
20.183606 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
20.184158 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
20.184547 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95)
20.185152 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
20.185351 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
20.185384 nouveau: detected PR support, will not use DSM
20.185552 nouveau 0000:01:00.0: enabling device (0000 -> 0003)
20.185873 nouveau 0000:01:00.0: unknown chipset (ffffffff)
20.185946 nouveau: probe of 0000:01:00.0 failed with error -12
> What's happening is, the PCI core will keep unbound devices (i.e.,
> without driver) in D0 but the runtime status is allowed to change
> to "suspended". So it'll appear to the kernel as if it was suspended
> but in reality it stays in D0.
>
> Once runtime pm for PCIe ports gets merged, the root port above the
> GPU will indeed go to D3 in such a situation because the check
> pm_children_suspended() (called from rpm_check_suspend_allowed())
> returns true.
>
> I'm not sure if this is desirable or not. If we keep unbound devices
> in D0, should we allow ports above them to go to D3?
Maybe Rafael (linux-pm / linux-pci) can answer this question better?
The comments in local_pci_probe, pci_pm_runtime_suspend and
pci_pm_runtime_resume suggest that unbound devices are assumed in D0
which is apparently not the case when runtime PM is enabled.
> In any case, when nouveau is loaded again, local_pci_probe() will
> call pm_runtime_get_sync(), which will implicitly set the runtime
> status to "active" and which should also wake parents. So how did
> you ever reach a point where you loaded nouveau and the root port
> stayed asleep? Clearly we have a bug there, question is where.
> This shouldn't work only if pm_runtime_forbid() was called on
> driver unload.
>
> Thanks for the extensive testing!
> Lukas
Both devices (root port and Nvidia) were resumed, but somehow the Nvidia
card was not fully initialized/ready (as you can see in the above logs).
Peter
> >
> > Peter
> >
> > > ---
> > > drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > index 11f8dd9..faf7438 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev)
> > > {
> > > struct nouveau_drm *drm = nouveau_drm(dev);
> > >
> > > - pm_runtime_get_sync(dev->dev);
> > > + if (nouveau_runtime_pm != 0) {
> > > + pm_runtime_get_sync(dev->dev);
> > > + }
> > > +
> > > nouveau_fbcon_fini(dev);
> > > nouveau_accel_fini(drm);
> > > nouveau_hwmon_fini(dev);
> > > --
> > > 2.8.1
> > >
> > > _______________________________________________
> > > Nouveau mailing list
> > > Nouveau at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/nouveau
More information about the Nouveau
mailing list