[Nouveau] 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

Thu Nov 29 17:26:51 UTC 2018

yeah... I don't think that's gonna work that nicely. Anyway, you
probably need to allow the VM to access all of ACPI in order to
runtime suspend the GPU and on newer laptops the vbios is retrieved
via ACPI as well. It would be probably better if there is a way to
trace all that on a machine running windows directly.
On Thu, Nov 29, 2018 at 6:12 PM Michael S. Tsirkin <mst at redhat.com> wrote:
>
> On Thu, Nov 29, 2018 at 11:53:53AM +0100, Karol Herbst wrote:
> > On Thu, Nov 29, 2018 at 2:29 AM Michael S. Tsirkin <mst at redhat.com> wrote:
> > >
> > > On Thu, Nov 29, 2018 at 12:21:31AM +0100, Karol Herbst wrote:
> > > > this was already debugged and there is no point in searching inside
> > > > the Firmware. It's not a firmware bug or anything.
> > > >
> > > > The proper fix is to do something inside Nouveau so that we don't
> > > > upset the device and being able to runtime resume it again.
> > > >
> > > > The initial thing we do inside Nouveau to cause those issues is to run
> > > > that so called "DEVINIT" script inside the vbios to initialize the
> > > > GPU, problem is, it changes something on the PCIe configuration so
> > > > that the GPU isn't able to runtime resume anymore. I am in contact
> > > > with Nvidia about that issue and hopefully we get the proper answers.
> > > > When I was digging into that myself I was able to make the situation
> > > > more stable by setting the PCIE link speed to the boot defaults, but
> > > > that was still pretty unstable.
> > > >
> > > > Anyway, because the binary driver fails here as well (through
> > > > bumblebee and so on) there isn't much of reverse engineering we can do
> > > > besides guessing and trying it on literally every hardware until it
> > > > works.
> > > >
> > > > We also have an upstream bug for this issue:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=156341
> > >
> > > If you like I can probably dump the pcie registers on card
> > > and/or the pcie port under windows. The card works there :)
> > > Let me know.
> > >
> > > --
> > > MST
> >
> > the problem is, we would need to know the registers right before
> > suspending the GPU. If someone would be able to trace all PCIe
> > register read and writes for the entire suspending/resume process,
> > that would be very helpful.
>
>
> Well I can pass the card to a VM, and trace it on the hypervisor, that
> isn't a problem.  A tricky thing is the ACPI tables, would need to
> somehow know which ones are relevant to pass them to guest ... ideas on
> that?
>
> --
> MST