[Intel-gfx] 5 bugs

Fri Jun 17 01:12:16 CEST 2011

On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce at canonical.com> wrote:
> On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote:
> > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce at canonical.com> wrote:
> > > Hi Max,
> > > 
> > > I currently am tracking 6 bug reports with the intel driver so far for
> > > the oneiric development cycle, of which 5 have been forwarded upstream:
> > > 
> > >   https://bugs.freedesktop.org/show_bug.cgi?id=36515
> > 
> > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
> > thought we had beaten into submission. The other reports provide more
> > circumstantial evidence to suggest that the hang coincides with a hotplug
> > event. I think the cause is a race between the kernel turning the pipe off
> > due to the hotplug and reprobing and that uevent reaching the ddx. In the
> > meantime, we've queued another video frame to execute on the dead pipe.
> > Worse we may have queued it up long before the hotplug event and due to
> > buffering in the GPU command stream it only gets executed afterwards.
> > 
> > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
> > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > Date:   Sat Nov 13 09:49:11 2010 +0000
> > 
> >     drm/i915: Retire any pending operations on the old scanout when switching
> > 
> > Handles the case were we are changing modes. Unfortunately, disabling an
> > output takes a different path. Though, I think we can a similar big hammer
> > approach there are well.
> 
> As luck would have it, my own i965 laptop locked up today with I guess
> this same bug.  IPEHR=0x01820000
> 
> Before I restart it, is there any data which could be gathered that
> would assist you?

My theory is based upon this still being a WAIT_EVENT on a disable pipe.
The error state should support this is the DSP*CNTR is disabled for the
pipe we are waiting on. But the other observation to make is whether you
know if a modeset happened at around the same time as the hang.

> 
> Otherwise, I can boot and test the patch you posted to the bug.

I'm confident that that patch closes another window for the bug. I'm
less confident that that's the only race condition we have.

> One of the difficulties with this type of bug is that it's so
> intermittent and uncertain to reproduce (and so easily confused with
> other unrelated freezes), that it's hard to tell for certain if a given
> patch has definitively helped the situation.  Do you have suggestions on
> ways of measuring this better, or techniques to help in triggering the
> bug more reliably?

If am I right, then we have two paths that cause WAIT_FOR_EVENT,
windowed swapbuffers (or sub_copy_swap) and video. So playing a number
of video streams should increase the likelihood of the bug, run in
parallel with looping xrandr mode changes - in particular disabling
outputs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre