[Intel-gfx] [PATCH 2/2] [v2] drm/i915: Disable GGTT PTEs on GEN6+ suspend

Daniel Vetter daniel at ffwll.ch
Fri Oct 18 15:45:43 CEST 2013


On Wed, Oct 16, 2013 at 10:06:27AM -0700, Ben Widawsky wrote:
> On Wed, Oct 16, 2013 at 05:58:31PM +0100, Chris Wilson wrote:
> > On Wed, Oct 16, 2013 at 09:21:30AM -0700, Ben Widawsky wrote:
> > > Once the machine gets to a certain point in the suspend process, we
> > > expect the GPU to be idle. If it is not, we might corrupt memory.
> > > Empirically (with an early version of this patch) we have seen this is
> > > not the case. We cannot currently explain why the latent GPU writes
> > > occur.
> > > 
> > > In the technical sense, this patch is a workaround in that we have an
> > > issue we can't explain, and the patch indirectly solves the issue.
> > > However, it's really better than a workaround because we understand why
> > > it works, and it really should be a safe thing to do in all cases.
> > > 
> > > The noticeable effect other than the debug messages would be an increase
> > > in the suspend time. I have not measure how expensive it actually is.
> > > 
> > > I think it would be good to spend further time to root cause why we're
> > > seeing these latent writes, but it shouldn't preclude preventing the
> > > fallout.
> > > 
> > > NOTE: It should be safe (and makes some sense IMO) to also keep the
> > > VALID bit unset on resume when we clear_range(). I've opted not to do
> > > this as properly clearing those bits at some later point would be extra
> > > work.
> > > 
> > > v2: Fix bugzilla link
> > 
> > And the other one?
> > 
> 
> I'm really amazing. If we move ahead with this patch, Daniel, can you just erase
> the extra bugs.freedesktop.org/6549://
> 
> > > Bugzilla: http://bugs.freedesktop.org/6549://bugs.freedesktop.org/show_bug.cgi?id=65496
> 
> Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496

Fixed and merged with cc: stable.
-Daniel

> 
> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321
> > > Tested-by: Takashi Iwai <tiwai at suse.de>
> > > Tested-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > 
> > So clearing the valid bit should result in the GPU reporting errors for
> > delayed accesses, but none were reported?
> > -Chris
> > 
> 
> So I can't actually reproduce the problem for some reason. Paulo will
> need to answer. One theory is the fault information is lost on suspend.
> 
> The original patch put faults both in suspend, and resume. After this, I
> asked Paulo to wedge the GPU, and there I saw faults.
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list