[Intel-gfx] [PATCH] drm/i915: workaround bad DSL readout v2

Mon Sep 14 06:02:44 PDT 2015

On Mon, Sep 14, 2015 at 11:10:04AM +0200, Daniel Vetter wrote:
> On Fri, Sep 11, 2015 at 01:33:08AM +0300, Ville Syrjälä wrote:
> > On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
> > > On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> > > > On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> > > >> On HSW at least (still testing other platforms, but should be harmless
> > > >> elsewhere), the DSL reg reads back as 0 when read around vblank start
> > > >> time.  This ends up confusing the atomic start/end checking code, since
> > > >> it causes the update to appear as if it crossed a frame count boundary.
> > > >> Avoid the problem by making sure we don't return scanline_offset from
> > > >> the get_crtc_scanline function.  In moving the code there, I add to add
> > > >> an additional delay since it could be called and have a legitimate 0
> > > >> result for some time (depending on the pixel clock).
> > > >>
> > > >> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> > > >>
> > > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> > > >> Signed-off-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> > > >> ---
> > > >>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
> > > >>  1 file changed, 21 insertions(+)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > >> index 90bc6c2..97e5d52 100644
> > > >> --- a/drivers/gpu/drm/i915/i915_irq.c
> > > >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > >> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
> > > >>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
> > > >>  
> > > >>  	/*
> > > >> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> > > >> +	 * read it right around the start of vblank.  So try it again
> > > >> +	 * so we don't accidentally end up spanning a vblank frame
> > > >> +	 * increment, causing the pipe_update_end() code to squak at us.
> > > >> +	 */
> > > >> +	if (IS_HASWELL(dev) && !position) {
> > > >> +		int i, temp;
> > > >> +
> > > >> +		for (i = 0; i < 100; i++) {
> > > >> +			udelay(1);
> > > >> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> > > >> +				DSL_LINEMASK_GEN3;
> > > >> +			if (temp != position) {
> > > >> +				position = temp;
> > > >> +				goto out;
> > > >> +			}
> > > >> +		}
> > > >> +	}
> > > > 
> > > > Hmm. Another idea. If it always happens at start of vbl, maybe have a
> > > > look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> > > > return vblank_start. That's assming ISR would in fact show that it's in
> > > > vblank when this happens.
> > > 
> > > Sounds a bit racy,
> > 
> > Not too much I think. We're under spinlock_irqsave here, and if the
> > scanline should magically start to tick just after we've read it we
> > would still give almost the right answer, even if we got magically
> > delayed by one or two scanlines. That's assuming that it has already
> > latched the double buffered registers when it goes wonky. If the
> > register latching happens only when the counter starts working again,
> > well then we're in deeper doodoo. But then I wouldn't expect ISR to
> > indicate that we're in vblank then either. So I think you should
> > definitely check what the ISR actually tells you while things are
> > in the bad state.
> 
> There's still a race
> 1. we read bogus vblank counter
> 2. vblank processing completes in hw
> 3. ISR indicates where out of vblank.
> 
> So you'd need at least a loop to make sure you're 0 vblank counter is
> stable across the ISR read.

Seems rather unlikely. That would mean something blocked us for the entire
duration of the vblank.

> -Daniel
> 
> > In case the badness happens just before start of vblank, well, then
> > I have no good idea how to handle it. This sort of retry loop is
> > the best that comes to mind right now :(
> > 
> > > though I guess it wouldn't hurt.  I'm actually a
> > > little dubious about this change anyway since it will mean longer delays
> > > when we really are at the top of the field/frame.  You sure you don't
> > > like the first one better?
> > 
> > The first one can't help vblank timestamps. Although I suppose that if
> > the badness always cures itself before the interrupt gets raised, we
> > would not hit this normally when we update the timestamp from the irq
> > handler. We could hit it when updating the timestamp outside the irq
> > handler though (ie. during vblank_get/put etc.).
> 
> On top of that there's patches floating from Chris to get a vblank
> timestamp without grabbing the interrupts temporarily or any spinlock.
> That helps when we immediately disable the interrupt and then a bit later
> the compositors asks for the current vblank to estimate the next frame.

I think we all already agreed that it's not going to fly. If we don't
ask the hw how many vblanks we lost we can't give userspace any kind
of accurate information.

> So if possible I'd definitely vote to share this "fix up bonghits vblank"
> logic if possible ...

I have no idea what you're looking to share and with whom. I do not want
workarounds for hardware specific bugs in the shared vblank code. We
already had that due to radeon, and that stuff has been doing nothing
for us except hinder attempts at making the code work sanely for sane
Intel hardware.

Which reminds me, I need to post a pile of vblank patches...

-- 
Ville Syrjälä
Intel OTC