[Intel-gfx] [PATCH 05/10] drm/i915: Disable 'get seqno' workaround for VLV

Mon Dec 15 01:02:36 PST 2014

On Wed, Dec 10, 2014 at 05:11:09PM +0000, Dave Gordon wrote:
> On 10/12/14 10:42, Daniel Vetter wrote:
> > On Tue, Dec 09, 2014 at 12:59:08PM +0000, John.C.Harrison at Intel.com wrote:
> >> From: Dave Gordon <david.s.gordon at intel.com>
> >>
> >> There is a workaround for a hardware bug when reading the seqno from the status
> >> page. The bug does not exist on VLV however, the workaround was still being
> >> applied.
> > 
> > Given how much trouble the missed seqno fun cause us I'd like more
> > justification. What kind of (stress)-testing has been done here? And
> > you're sure you've never seen the little dmesg notice that the kernel
> > switched to polling?
> > -Daniel
> 
> This was necessary during VLV scheduler/preemption work, where we
> stressed the command streamer and IRQ path probably more than any of the
> usual tests, since we were looking for out-of-sequence anomalies.
> 
> The comments in gen6_ring_get_seqno() say it's to fix a bug on ivb/snb,
> and nothing in the BSpec suggests it's needed on later chips.
> 
> I think the problem was really in the fact that the workaround
> implemented in gen6_ring_get_seqno() was reading a nonshadowed register,
> therefore triggering forcewake activity, which was pretty buggy; so we
> took out the gen6-specific code before finding that there were still
> other problems with forcewake.
> 
> [Aside]
> The original intention appears to have been to generate a single I/O
> READ cycle, forcing all pending inbound DMA so be flushed to memory, so
> that the subsequent memory read would see the latest value. But now, it
> doesn't just generate a single cycle but a whole flurry of writes and
> polling reads, because reading ACTHD requires forcewake (unless that was
> wrong, and has now been fixed). So should gen6_ring_get_seqno() instead
> read a register that doesn't require forcewake? I wouldn't object to
> using it in perpetuity if it really only read one register!
> [/aside]

The original intention was to work around missed interrupts. We still have
them on pretty much everything gen6+. The workaround we've done for the
performance issues this causes is the lazy_coherency trick where we only
do this when waiting. It's all pretty nicely cargo-culted and not in
Bspec, but it's also fairly real. We've given up on debugging, current
kernels switch all wait_request calls to polling once the first missed
interrupt happened. See the logic around gpu_error.missed_irq_rings.

It might be that it's not real on vlv and only on big core machines, but
then we don't have a lot of people using vlv on upstream and reporting
bugs.

For added insult we don't have any nice way to reproduce this really :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch