4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

Mika Westerberg mika.westerberg at linux.intel.com
Wed Nov 28 15:55:44 UTC 2018


On Wed, Nov 28, 2018 at 10:09:22AM -0500, Michael S. Tsirkin wrote:
> Yea all this is weird, in particular I wonder why does everyone
> using dsm insists on saying Arg4
> when they actually mean Arg3. ACPI numbers arguments from 0.
> 
> So it's a bit ugly, and maybe worth fixing but unlikely to be
> an actual issue simply because we end up not using DSM in the end.

I agree.

> Poking at the probing code in nouveau_pr3_present, I started to wonder:
> should I try to hack it to disable d3cold and pr3 and see what
> happens?

I guess it is worth a try. You can do it from sysfs for the graphics
PCI device there is an attribute d3cold_allowed that controls this.

[snip]

> > > 00:14.3 Network controller: Intel Corporation Wireless-AC 9560 [Jefferson Peak] (rev 10)
> > > 
> > > so really shouldn't be affected, but go figure. If driver really is getting
> > > all-ones from the device, it just might try to poke at a wrong b:d.f by mistake
> > > maybe ...
> > 
> > Or it the power resource is shared by wifi as well.
> 
> Is there a way to find out through e.g. sysfs?

It is not shared, I checked from the acpidump you provided. Possibly the
infinite loop in AML when executing NVPO method have some effect on
this.

[snip]

> > No need to send, I can read it from the bugzilla just fine. Can you attach
> > acpidump there as well?
> 
> Done. lspci -x too just in case.

Looking at the dmesg:

[   52.917009] No Local Variables are initialized for Method [NVPO]
[   52.917011] No Arguments are initialized for method [NVPO]
[   52.917012] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[   52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[   52.917084] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PG00._ON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)

So what happens here is that Linux turns off power resource
\_SB.PCI0.PEG0.PG00 by calling its _OFF method (happens when the root
port is runtime suspended). This ends up calling \_SB.PCI0.PGON which
calls \_SB.PCI0.PEG0.PEGP.NVPO.

The last method looks like this:

       Method (NVPO, 0, NotSerialized)
        {
            While ((\_SB.PCI0.P0LS < 0x03))
            {
                Sleep (One)
            }

So basically it polls P0LS register infinitely if the returned value is
less than 3. I suspect this is the issue and it then makes the other
like wifi to fail to execute its methods.

P0LS comes from this operation region:

        OperationRegion (OPG0, SystemMemory, (XBAS + 0x8000), 0x1000)
        Field (OPG0, AnyAcc, NoLock, Preserve)
        {
            ...
            Offset (0x216),
            P0LS,   4,

This is some host bridge register but not sure which because XBAS value
cannot be determined from the acpidump.


More information about the dri-devel mailing list