more intel drm issues (was Re: [git pull] drm intel only fixes)

Thu Jan 20 08:07:02 PST 2011

On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> Right, the autoreported HEAD may have been already reset to 0 and so hit
> the wraparound bug which caused it to exit early without actually
> quiescing the ringbuffer.

Yeah, that would explain the issue.

> Another possibility is that I added a 3s timeout waiting for a request if
> IRQs were suspended:

No, if IRQ's are actually suspended here, then that codepath is
totally buggy and would blow up (msleep() doesn't work, and jiffies
wouldn't advance on UP). So that's not it.

> Both of those I think are symptoms of another problem, that perhaps during
> suspend we are shutting down parts of the chip before idling?

That could be, but looking at the code, one thing strikes me: the
_normal_ case (of just waiting for "enough space" in the ring buffer)
doesn't need to use the exact case, but the "wait for ring buffer to
be totally empty" does.

Which means that the use of the "fast-but-inaccurate" 'head' sounds
wrong for the "wait for idle" case.

So can you explain the difference between

   intel_read_status_page(ring, 4);

vs

   I915_READ_HEAD(ring);

because from looking at the code, I get the notion that
"intel_read_status_page()" may not be exact. But what happens if that
inexact value matches our cached ring->actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty. No?

Also, isn't that "head < ring->actual_head" buggy? What about the
overflow case? Not that we care, because afaik, 'actual_head' is not
actually used anywhere, so it should be called 'pointless_head'?

That code looks suspiciously bogus.

                    Linus