[PATCH] drm/amdgpu: always reset asic when going into suspend

Wed Jan 15 07:44:39 UTC 2020

On Thu, Dec 19, 2019 at 10:08 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> I think there may be some AMD specific handling needed in
> drivers/acpi/sleep.c.  My understanding from reading the modern
> standby documents from MS is that each vendor needs to provide a
> platform specific PEP driver.  I'm not sure how much of that current
> code is Intel specific or not.

I don't think there is anything Intel-specific in drivers/acpi/sleep.c.

Reading more about PEP, I see that Linux supports PEP devices with
ACPI ID INT33A1 or PNP0D80. Indeed the Intel platforms we work with
have INT33A1 devices in their ACPI tables.

This product has a \_SB.PEP ACPI device with _HID AMD0004 and _CID
PNP0D80. Full acpidump:
https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be (see
ssdt7)

This PEP device responds to a _DSM with UUID argument
"e3f32452-febc-43ce-9039-932122d37721", which is not the one
documented at https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf

Nevertheless, there is some data about the GPU:
                    Package (0x04)
                    {
                        One,
                        "\\_SB.PCI0.GP17.VGA",
                        Zero,
                        0x03
                    },

However since this data is identical to many other devices that
suspend and resume just fine, I wonder if it is really important.

The one supported method does offer two calls which may mirror the
Display Off/On Notifications in the above spec:
                        Case (0x02)
                        {
                            \_SB.PCI0.SBRG.EC0.CSEE (0xB7)
                            Return (Zero)
                        }
                        Case (0x03)
                        {
                            \_SB.PCI0.SBRG.EC0.CSEE (0xB8)
                            Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) //
Status Change
                            Return (Zero)
                        }

but I tried executing this code after suspending amdgpu, and the
problem still stands, amdgpu cannot wakeup correctly.

There's nothing else really interesting in the PEP device as far as I can see.

PEP things aside, I am still quite suspicious about the fact that
calling amdgpu_device_suspend() then amdgpu_device_resume() on
multiple products (not just this one) fails. It seems that this code
flow is relying on the BIOS doing something in the S3 suspend/resume
path in order to make the device resumable by amdgpu_device_resume(),
which is why we have only encountered this issue for the first time on
our first AMD platform that does not support S3 suspend.

With that in mind, and lacking any better info, wouldn't it make sense
for amdgpu_device_resume() to always call reset? Maybe it's not
necessary in the S3 case, but it shouldn't harm anything. Or perhaps
it could check if the device is alive and reset it if it's not?

Alternatively do you have any other contacts within AMD that could
help us figure out the underlying question of how to correctly suspend
and resume these devices? Happy to ship an affected product sample
your way.

Thanks
Daniel