[Intel-gfx] S4 resume breakage with i915 driver

Dave Gordon david.s.gordon at intel.com
Fri Sep 2 18:34:52 UTC 2016


On 29/08/16 14:32, Daniel Vetter wrote:
> On Fri, Aug 26, 2016 at 02:42:47PM +0300, Imre Deak wrote:
>> On pe, 2016-08-26 at 14:10 +0300, Imre Deak wrote:
>>> On pe, 2016-08-26 at 11:39 +0100, Chris Wilson wrote:
>>>> On Fri, Aug 26, 2016 at 12:25:01PM +0200, Takashi Iwai wrote:
>>>>> On Fri, 26 Aug 2016 11:18:00 +0200,
>>>>> Takashi Iwai wrote:
>>>>>> I had to modify the intel_gpu_reset() call because the test was
>>>>>> done
>>>>>> on the older kernel, so it's like:
>>>>>>
>>>>>> +       intel_gpu_reset(dev_to_i915(dev)->dev);
>>>>>>
>>>>>> And, it seems working on HSW! \o/
>>>>>>
>>>>>> A simple trick, better than the magical register write revert.
>>>>>> I'll check other machines, too, to see whether it has any
>>>>>> negative
>>>>>> impact.
>>>>>
>>>>> The test results look good on all machines.
>>>>
>>>> The theory then is that the GPU's are active across the load of the
>>>> hibernation image and so before the GTT is updated the memory
>>>> currently
>>>> in use by the GPU is reused by the system.
>>>>
>>>> The key question then is the memory of boot kernel still in place
>>>> during
>>>> the hibernate restore phase?
>>>
>>> Before restoring the image all devices are quiesced by calling their
>>> freeze callback, so the GPU should be idle already
>>> in i915_pm_restore_early() already.
>>
>> But this happens in the loader kernel, so if that doesn't have the
>> driver built-in then the freeze callback won't be called either. So any
>> possible BIOS related GPU activity/setup should be quiesced from the
>> restore callback then.
>
> I thought the loader kernel has an entire initrd attached, to allow stuff
> like typing in the disk encryption passwd. Which means we very much do
> load i915 in the loader kernel already.
>
> So maybe we need to throw a gpu reset into the right hook (shutdown or
> whatever it was) to make sure the loader kernel really stops all gpu write
> cycles, including anything done due to power saving context restoring. We
> already know that the only way to get the gpu off the context image
> permanently is a gpu reset, so that would make some sense.
> -Daniel
>
>>
>>>> If we need to add a ->shutdown callback (if
>>>> that is even called before hibernate restore) then we can only fix
>>>> future kernels and are still susceptible to corruption when booing
>>>> from
>>>> old kernels.
>>>>
>>>> Any one familiar with the details of the hibernation restore? (And
>>>> how
>>>> much relates to kexec?)
>>>> -Chris

Seems like:

1. the driver should quiesce the h/w *as much as possible* during the 
saving of the hibernation image -- which may mean a reset during this 
phase; otherwise there might be some risk of the saved image being 
incomplete or inconsistent.

2. the driver should make *no assumptions whatsoever* about the state of 
the h/w during resume-from-hibernation, as we don't know what state the 
bios may have left it in, or whether a (possibly completely different) 
version of the driver in the loader kernel played with it at all. So a 
hard reset should be mandatory during resume.

[Aside: that other (fault-tolerant) kernel I used to work on took this 
approach. Whenever a driver is *given* control of a device, it should 
assume *nothing*, reset the h/w, and reprogram from scratch. And when 
it's about to *lose* control of a device, it should quiesce all activity 
and then reset the h/w to the default state before handing it back. That 
way there are no surprises for anybody.]

.Dave.



More information about the Intel-gfx mailing list