[Intel-gfx] [RFC] drm/i915: Rework "Potential atomic update error" to handle PSR exit
Ville Syrjälä
ville.syrjala at linux.intel.com
Fri Apr 27 12:41:42 UTC 2018
On Thu, Apr 26, 2018 at 08:09:56PM -0700, Tarun Vyas wrote:
> On Thu, Apr 26, 2018 at 02:39:04PM -0700, Tarun Vyas wrote:
> > On Thu, Apr 26, 2018 at 10:47:40AM -0700, Dhinakaran Pandiyan wrote:
> > >
> > >
> > >
> > > On Thu, 2018-04-26 at 16:41 +0300, Ville Syrjälä wrote:
> > > > On Wed, Apr 25, 2018 at 07:10:09PM -0700, tarun.vyas at intel.com wrote:
> > > > > From: Tarun <tarun.vyas at intel.com>
> > > > >
> > > > > The Display scanline counter freezes on PSR entry. Inside
> > > > > intel_pipe_update_start, once Vblank interrupts are enabled, we start
> > > > > exiting PSR, but by the time the scanline counter is read, we may not
> > > > > have completely exited PSR which leads us to schedule out and check back
> > > > > later.
> > > > > On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 but
> > > > > lags w.r.t core kernel code, hot plugging an external display triggers
> > > > > tons of "potential atomic update errors" in the dmesg, on *pipe A*. A
> > > > > closer analysis reveals that we try to read the scanline 3 times and
> > > > > eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL stuck @
> > > > > 1599.
> > > > > This issue is not seen on upstream kernels, b/c for *some* reason we
> > > > > loop inside intel_pipe_update start for ~2+ msec which in this case is
> > > > > more than enough to exit PSR fully, hence an *unstuck* PIPEDSL counter,
> > > > > hence no error. On the other hand, the ChromeOS kernel spends ~1.1 msec
> > > > > looping inside intel_pipe_update_start and hence errors out b/c the
> > > > > source is still in PSR.
> > > > >
> > > > > If PSR is enabled, then we should *wait* for the PSR
> > > > > state to move to IDLE before re-reading the PIPEDSL so as to avoid bogus
> > > > > and annoying "potential atomic update error" messages.
> > > > >
> > > > > P.S: This scenario applies to a configuration with an additional pipe,
> > > > > as of now.
> > > > >
> > >
> > > Ville,
> > >
> > > Any idea what could be the reason the warnings start appearing when an
> > > external display is connected? We couldn't come up with an explanation.
> > >
> > Another source of confusion for me is that on the upstream kernels, it *appears* to take more time for us to get *re-scheduled* after we call schedule_timeout(). So with ~2+msec spent in the loop, it seems to be not working as intended b/c we end up spending a lot more time in the loop, which in turn contributes to this issue not being seen on upstream kernels.
> > >
> > > > > Signed-off-by: Tarun <tarun.vyas at intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/i915/intel_sprite.c | 19 +++++++++++++++----
> > > > > 1 file changed, 15 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > index aa1dfaa692b9..77dd3b936131 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_sprite.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_sprite.c
> > > > > @@ -92,11 +92,13 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
> > > > > struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> > > > > const struct drm_display_mode *adjusted_mode = &new_crtc_state->base.adjusted_mode;
> > > > > long timeout = msecs_to_jiffies_timeout(1);
> > > > > - int scanline, min, max, vblank_start;
> > > > > + int scanline, min, max, vblank_start, old_scanline, new_scanline;
> > > > > + bool retried = false;
> > > > > wait_queue_head_t *wq = drm_crtc_vblank_waitqueue(&crtc->base);
> > > > > bool need_vlv_dsi_wa = (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) &&
> > > > > intel_crtc_has_type(new_crtc_state, INTEL_OUTPUT_DSI);
> > > > > DEFINE_WAIT(wait);
> > > > > + old_scanline = new_scanline = -1;
> > > > >
> > > > > vblank_start = adjusted_mode->crtc_vblank_start;
> > > > > if (adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE)
> > > > > @@ -126,15 +128,24 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
> > > > > * read the scanline.
> > > > > */
> > > > > prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > > > -
> > > > > +retry:
> > > > > scanline = intel_get_crtc_scanline(crtc);
> > > > > + old_scanline = new_scanline, new_scanline = scanline;
> > > > > +
> > > > > if (scanline < min || scanline > max)
> > > > > break;
> > > > >
> > > > > if (timeout <= 0) {
> > > > > - DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > > > + if(!i915.enable_psr || retried) {
> > >
> > > You could use the CAN_PSR() macro that checks for source and sink
> > > support.
> > >
> > Will do.
> > > > > + DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > > > pipe_name(crtc->pipe));
> > > > > - break;
> > > > > + break;
> > > > > + }
> > > > > + else if(old_scanline == new_scanline && !retried) {
> > > > > + retried = true;
> > > > > + intel_wait_for_register(dev_priv, EDP_PSR_STATUS_CTL, EDP_PSR_STATUS_STATE_MASK, EDP_PSR_STATUS_STATE_IDLE, 10);
> > > >
> > > > What's the point of obfuscating the loop with this stuff?
> > > > Just wait for the PSR exit before we even enter the loop?
> > > >
> > Agreed.
> On a second thought, I was doing it wrong in the initial RFC. Can't do a wait_for_register with irqs disabled by local_irq_disable(). So, will have to *poll* the PSR_STATE, but will that be desirable ?
Do it before disabling the irqs? As long as we prevent it from
re-entering PSR after the wait it should be safe. Maybe the vblank irq
is the best way to prevent the re-entry?
> > > > > + goto retry;
> > > > > + }
> > > > > }
> > > > >
> > > > > local_irq_enable();
> > > > > --
> > > > > 2.13.5
> > > > >
> > > > > _______________________________________________
> > > > > Intel-gfx mailing list
> > > > > Intel-gfx at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > >
> > >
--
Ville Syrjälä
Intel
More information about the Intel-gfx
mailing list