[Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

Thu Feb 26 10:50:48 PST 2015

On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
> Imre Deak <imre.deak at intel.com> writes:
>
> >> That patch fixes the problem, with only pci_set_power_state commented
> >> out.  Do you still want me to try with pci_disable_device() commented
> >> out as well?
> >
> > No, but it would help if you could still try the two attached patch
> > separately, without any of the previous workarounds. Based on the
> > result, we'll follow up with a fix that adds for your specific platform
> > either of the attached workarounds or simply avoids putting the device
> > into D3 (corresponding to the patch you already tried).
>
> None of those patches made any difference.  The laptop still hangs at
> power-off.
>
> Not really surprising, is it?  Previous testing shows that the hang
> occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the
> poweroff_late hook.  It is hard to see how replacing it by an attempt to
> set D3cold, or adding any call after this point, could possibly change
> anything.  The system is stil hanging at the pci_set_power_state() call.

Judging from the blinking LED, I assume that it's not
pci_set_power_state() that hangs the machine, but the hang happens in
BIOS code.

> The generic pci-driver code will put the i915 device into PCI_D3hot for
> you, won't it? Why do you need to duplicate that in the driver,
> *knowing* that doing so breaks (at least some) systems?

Letting the pci core put the device into D3 wouldn't get rid of the problem.
It's putting the device into D3 in the first place what causes it.

> I honestly don't think this "let's try some random code" is the proper
> way to fix this bug (or any other bug for that matter).  You need to
> start understanding the code you write, and the first step is by
> actually explaining the changes you make.

We have a good understanding about the issue: the BIOS on your platform
does something unexpected behind the back of the driver/kernel. In that
sense the patches were not random, for instance the first one is based on
an existing workaround for an issue that resembles quite a lot yours, see
pci_pm_poweroff_noirq().

> I also believe that you completely miss the fact that this bug has
> survived a full release cycle before you became aware of it, and the
> implications this has wrt other affected systems/users.  I assume you
> understand that my system isn't one-of-a-kind, This means that there are
> other affected users with identical/similar systems.  Now, if none of
> those users reported the bug to you (we all know why: Linux kernel
> development is currently limited by the available testing resources, NOT
> by the available developer resources), then how do you know that there
> aren't a number of other systems affected as well?
>
> Let me answer that for you:  You don't.
>
> Which is why you must explain the mechanism triggering the bug, proving
> that it is a chipset/system specific issue.  Because that's the only way
> you will *know* that you have solved the problem not only for me, but for
> all affected users.
>
> IMHO, the only safe and sane solution at the moment is the revert patch
> I posted.  It's a simple fix, reverting back to the *known* working
> state before this regression was introduced.
>
> Then you can start over from there, trying to implement this properly.

The current way is the proper one that we want for the generic case. The issue
on your platform is the exception, so working around that is a sensible choice.

Attached is the proposed fix for this issue.

--Imre

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-i915-gm45-work-around-hang-during-hibernation.patch
Type: text/x-patch
Size: 3625 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20150226/52357a6d/attachment.bin>