[Intel-gfx] [RFC] drm/i915: Rework "Potential atomic update error" to handle PSR exit

Fri Apr 27 12:41:42 UTC 2018

On Thu, Apr 26, 2018 at 08:09:56PM -0700, Tarun Vyas wrote:
> On Thu, Apr 26, 2018 at 02:39:04PM -0700, Tarun Vyas wrote:
> > On Thu, Apr 26, 2018 at 10:47:40AM -0700, Dhinakaran Pandiyan wrote:
> > > 
> > > 
> > > 
> > > On Thu, 2018-04-26 at 16:41 +0300, Ville Syrjälä wrote:
> > > > On Wed, Apr 25, 2018 at 07:10:09PM -0700, tarun.vyas at intel.com wrote:
> > > > > From: Tarun <tarun.vyas at intel.com>
> > > > > 
> > > > > The Display scanline counter freezes on PSR entry. Inside
> > > > > intel_pipe_update_start, once Vblank interrupts are enabled, we start
> > > > > exiting PSR, but by the time the scanline counter is read, we may not
> > > > > have completely exited PSR which leads us to schedule out and check back
> > > > > later.
> > > > > On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 but
> > > > > lags w.r.t core kernel code, hot plugging an external display triggers
> > > > > tons of "potential atomic update errors" in the dmesg, on *pipe A*. A
> > > > > closer analysis reveals that we try to read the scanline 3 times and
> > > > > eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL stuck @
> > > > > 1599.
> > > > > This issue is not seen on upstream kernels, b/c for *some* reason we
> > > > > loop inside intel_pipe_update start for ~2+ msec which in this case is
> > > > > more than enough to exit PSR fully, hence an *unstuck* PIPEDSL counter,
> > > > > hence no error. On the other hand, the ChromeOS kernel spends ~1.1 msec
> > > > > looping inside intel_pipe_update_start and hence errors out b/c the
> > > > > source is still in PSR.
> > > > > 
> > > > > If PSR is enabled, then we should *wait* for  the PSR
> > > > > state to move to IDLE before re-reading the PIPEDSL so as to avoid bogus
> > > > > and annoying "potential atomic update error" messages.
> > > > > 
> > > > > P.S: This scenario applies to a configuration with an additional pipe,
> > > > > as of now.
> > > > > 
> > > 
> > > Ville, 
> > > 
> > > Any idea what could be the reason the warnings start appearing when an
> > > external display is connected? We couldn't come up with an explanation.
> > > 
> > Another source of confusion for me is that on the upstream kernels, it *appears* to take more time for us to get *re-scheduled* after we call schedule_timeout(). So with ~2+msec spent in the loop, it seems to be not working as intended b/c we end up spending a lot more time in the loop, which in turn contributes to this issue not being seen on upstream kernels.
> > > 
> > > > > Signed-off-by: Tarun <tarun.vyas at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_sprite.c | 19 +++++++++++++++----
> > > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > index aa1dfaa692b9..77dd3b936131 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_sprite.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > @@ -92,11 +92,13 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
> > > > >  	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> > > > >  	const struct drm_display_mode *adjusted_mode = &new_crtc_state->base.adjusted_mode;
> > > > >  	long timeout = msecs_to_jiffies_timeout(1);
> > > > > -	int scanline, min, max, vblank_start;
> > > > > +	int scanline, min, max, vblank_start, old_scanline, new_scanline;
> > > > > +	bool retried = false;
> > > > >  	wait_queue_head_t *wq = drm_crtc_vblank_waitqueue(&crtc->base);
> > > > >  	bool need_vlv_dsi_wa = (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) &&
> > > > >  		intel_crtc_has_type(new_crtc_state, INTEL_OUTPUT_DSI);
> > > > >  	DEFINE_WAIT(wait);
> > > > > +	old_scanline = new_scanline = -1;
> > > > >  
> > > > >  	vblank_start = adjusted_mode->crtc_vblank_start;
> > > > >  	if (adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE)
> > > > > @@ -126,15 +128,24 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
> > > > >  		 * read the scanline.
> > > > >  		 */
> > > > >  		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > > > -
> > > > > +retry:
> > > > >  		scanline = intel_get_crtc_scanline(crtc);
> > > > > +		old_scanline = new_scanline, new_scanline = scanline;
> > > > > +
> > > > >  		if (scanline < min || scanline > max)
> > > > >  			break;
> > > > >  
> > > > >  		if (timeout <= 0) {
> > > > > -			DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > > > +			if(!i915.enable_psr || retried) {
> > > 
> > > You could use the CAN_PSR() macro that checks for source and sink
> > > support.
> > > 
> > Will do.
> > > > > +				DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > > >  				  pipe_name(crtc->pipe));
> > > > > -			break;
> > > > > +				break;
> > > > > +			}
> > > > > +			else if(old_scanline == new_scanline && !retried) {
> > > > > +				retried = true;
> > > > > +				intel_wait_for_register(dev_priv, EDP_PSR_STATUS_CTL, EDP_PSR_STATUS_STATE_MASK, EDP_PSR_STATUS_STATE_IDLE, 10);
> > > > 
> > > > What's the point of obfuscating the loop with this stuff?
> > > > Just wait for the PSR exit before we even enter the loop?
> > > >
> > Agreed.
> On a second thought, I was doing it wrong in the initial RFC. Can't do a wait_for_register with irqs disabled by local_irq_disable(). So, will have to *poll* the PSR_STATE, but will that be desirable ?

Do it before disabling the irqs? As long as we prevent it from
re-entering PSR after the wait it should be safe. Maybe the vblank irq
is the best way to prevent the re-entry?

> > > > > +				goto retry;
> > > > > +			}
> > > > >  		}
> > > > >  
> > > > >  		local_irq_enable();
> > > > > -- 
> > > > > 2.13.5
> > > > > 
> > > > > _______________________________________________
> > > > > Intel-gfx mailing list
> > > > > Intel-gfx at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > > 
> > > 

-- 
Ville Syrjälä
Intel