[Intel-gfx] [PATCH 2/2] [v2] drm/i915: Disable GGTT PTEs on GEN6+ suspend

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 17 11:24:07 CEST 2013

On Thu, Oct 17, 2013 at 09:41:09AM +0200, Takashi Iwai wrote:
> At Wed, 16 Oct 2013 18:27:33 +0100,
> Chris Wilson wrote:
> > 
> > On Wed, Oct 16, 2013 at 10:06:27AM -0700, Ben Widawsky wrote:
> > > On Wed, Oct 16, 2013 at 05:58:31PM +0100, Chris Wilson wrote:
> > > > So clearing the valid bit should result in the GPU reporting errors for
> > > > delayed accesses, but none were reported?
> > > 
> > > So I can't actually reproduce the problem for some reason. Paulo will
> > > need to answer. One theory is the fault information is lost on suspend.
> > > 
> > > The original patch put faults both in suspend, and resume. After this, I
> > > asked Paulo to wedge the GPU, and there I saw faults.
> > 
> > If we can capture the error, and it should be very possible to do so, we
> > should be able to pinpoint the cause quite quickly. If it is just deferred
> > writes, it should also be a problem across module unload - which should
> > be easier for getting debug information out.
> The bug is only about S4, thus it's not so easy to capture anything in
> the resume kernel, as all lost after transition to the restored
> kernel.
> BTW, I also suspect that the similar problem might still happen in
> other cases, e.g. via kexec even with this patch.

How are devices idled (or suspended) prior to hibernate resume or kexec?
>From my reading, i915_drm_freeze() should be called before the resume
image is executed. What we can do is to make the first action of
i915_driver_unload() be i915_drm_freeze(), then clear the PTE valid
bits and wait a second or two for a GPU fault before proceeding with an
unload. By doing that we can debug our suspend paths - all that remains
is the possibility of rogue hardware state. And that should show up by
breaking module load.

Chris Wilson, Intel Open Source Technology Centre

More information about the Intel-gfx mailing list