[Intel-gfx] [PATCH 2/2] drm/i915/display/psr: Use continuos full frame to handle frontbuffer invalidations

Fri Mar 25 14:46:16 UTC 2022

On Fri, 2022-03-25 at 14:21 +0000, Hogander, Jouni wrote:
> Hello Jose,
> 
> See my comments below.
> 
> On Thu, 2022-03-24 at 11:13 -0700, José Roberto de Souza wrote:
> > Instead of exit PSR when a frontbuffer invalidation happens, we can
> > enable the PSR2 selective fetch continuous full frame, that will keep
> > the panel updated like PSR was disabled but without keeping PSR
> > active.
> 
> with keeping PSR active? I don't think it's like PSR was disabled. New
> full frame is updated only via atomic commit. Having PSR disabled new
> full frame is updated all the time as PSR wasn't existing at all.
> 
> > 
> > So as soon as the frontbuffer flush happens we can disable the
> > continuous full frame and start to do selective fetches much quicker
> > than the path that would enable PSR, that will wait a few frames
> > to actually activate PSR.
> > 
> > Also this approach has proven to fix some glitches found in
> > Alderlake-P
> > when there are a lot of invalidations happening together with page
> > flips.
> > 
> > Some may ask why it is writing to CURSURFLIVE(), it is because
> > that is the way that hardware team provided us to poke display to
> > handle PSR updates, and it is being used since display 9.
> 
> Generic comments:
> 
> Current logic is to disable psr2 in invalidate callback and start
> sending fullframe updates on every vblank period. This is done until
> flush callback where psr2 is re-enabled. Intention is to update
> possible frontbuffer writes between invalidate/flush instantly.
> 
> Now you are changing the logic to update one full frame when

It is not enabling the one full frame, it is enabling the continuous full frame so at every vblank panel will be updated until this bit cleared.

> frontbuffer write starts (_psr_invalidate_handle) and another one when
> it stops (_psr_flush_handle) without disabling psr at all. Have I
> understood your patch correctly?
> 
> Propably we wont notice this change as we have these invalidate/flush
> calls scattered around in the code. Also parallel atomic commits are
> triggering updates. In theory we could observe latency in updates
> between invalidate/flush? Do we care? What do you think?
> 
> Do we need to send update in invalidate at all? Isn't that usually
> called before doing any frontbuffer writing? I.e. we would be sending
> frame that is already in RFB?
> 
> > 
> > Cc: Khaled Almahallawy <khaled.almahallawy at intel.com>
> > Cc: Shawn C Lee <shawn.c.lee at intel.com>
> > Cc: Jouni Högander <jouni.hogander at intel.com>
> > Cc: Mika Kahola <mika.kahola at intel.com>
> > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_psr.c | 109 ++++++++++++++++++++-
> > --
> >  1 file changed, 95 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_psr.c
> > b/drivers/gpu/drm/i915/display/intel_psr.c
> > index d87b357806c91..f7b7b374374b1 100644
> > --- a/drivers/gpu/drm/i915/display/intel_psr.c
> > +++ b/drivers/gpu/drm/i915/display/intel_psr.c
> > @@ -1450,6 +1450,22 @@ static inline u32
> > man_trk_ctl_partial_frame_bit_get(struct drm_i915_private *dev
> >              PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE;
> >  }
> > 
> > +static inline u32 man_trk_ctl_continuos_full_frame(struct
> > drm_i915_private *dev_priv)
> > +{
> > +     return IS_ALDERLAKE_P(dev_priv) ?
> > +            ADLP_PSR2_MAN_TRK_CTL_SF_CONTINUOS_FULL_FRAME :
> > +            PSR2_MAN_TRK_CTL_SF_CONTINUOS_FULL_FRAME;
> > +}
> > +
> > +static inline u32 man_trk_ctl_su_region_start_end_mask(struct
> > drm_i915_private *dev_priv)
> > +{
> > +     if (IS_ALDERLAKE_P(dev_priv))
> > +             return ADLP_PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR_MASK
> > > 
> > +                    ADLP_PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MASK;
> > +     return PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR_MASK |
> > +            PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MASK;
> > +}
> > +
> >  static void psr_force_hw_tracking_exit(struct intel_dp *intel_dp)
> >  {
> >       struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > @@ -1546,8 +1562,9 @@ void intel_psr2_program_trans_man_trk_ctl(const
> > struct intel_crtc_state *crtc_st
> >       if (!crtc_state->enable_psr2_sel_fetch)
> >               return;
> > 
> > -     intel_de_write(dev_priv, PSR2_MAN_TRK_CTL(crtc_state-
> > > cpu_transcoder),
> > -                    crtc_state->psr2_man_track_ctl);
> > +     intel_de_rmw(dev_priv, PSR2_MAN_TRK_CTL(crtc_state-
> > > cpu_transcoder),
> > +                  man_trk_ctl_su_region_start_end_mask(dev_priv),
> > +                  crtc_state->psr2_man_track_ctl);
> 
> Should we actually now consider taking psr->lock here?

I don't think we need. mmio writes are syncronized, this will never set continuous full frame and will only clear su region start and end.
Also this function is called from a time sensitive section if we spend too much time here it will cause vblank evasion warnings.

> 
> >  }
> > 
> >  static void psr2_man_trk_ctl_calc(struct intel_crtc_state
> > *crtc_state,
> > @@ -2127,6 +2144,26 @@ static void intel_psr_work(struct work_struct
> > *work)
> >       mutex_unlock(&intel_dp->psr.lock);
> >  }
> > 
> > +static void _psr_invalidate_handle(struct intel_dp *intel_dp,
> > +                                unsigned int
> > prev_busy_frontbuffer_bits)
> > +{
> > +     struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > +
> > +     if (intel_dp->psr.psr2_sel_fetch_enabled) {
> > +             u32 val = man_trk_ctl_continuos_full_frame(dev_priv) |
> > +                       man_trk_ctl_partial_frame_bit_get(dev_priv);
> > +
> > +             /* continuos full frame is already enabled */
> > +             if (prev_busy_frontbuffer_bits)
> > +                     return;
> 
> Should we still trigger the update using CURSURFLIVE? Or do we need
> that at all in the first place?

*continuos full frame*

> 
> > +
> > +             intel_de_rmw(dev_priv, PSR2_MAN_TRK_CTL(intel_dp-
> > > psr.transcoder), 0, val);
> > +             intel_de_write(dev_priv, CURSURFLIVE(intel_dp-
> > > psr.pipe), 0);
> 
> So these two register writes here are triggering one full frame update.
> and leaving full frame update bit set so that coming updates are also
> full frame. Did I understood it correctly?

*continuos full frame*

> 
> As invalidate is called before frontbuffer is writen: Isn't it actually
> re-updating same frame which is already supposed to be in panel RFB?
> 
> > +     } else {
> > +             intel_psr_exit(intel_dp);
> > +     }
> > +}
> > +
> >  /**
> >   * intel_psr_invalidate - Invalidade PSR
> >   * @dev_priv: i915 device
> > @@ -2151,6 +2188,7 @@ void intel_psr_invalidate(struct
> > drm_i915_private *dev_priv,
> >       for_each_intel_encoder_with_psr(&dev_priv->drm, encoder) {
> >               unsigned int pipe_frontbuffer_bits = frontbuffer_bits;
> >               struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
> > +             unsigned int prev_busy_frontbuffer_bits;
> > 
> >               mutex_lock(&intel_dp->psr.lock);
> >               if (!intel_dp->psr.enabled) {
> > @@ -2158,12 +2196,13 @@ void intel_psr_invalidate(struct
> > drm_i915_private *dev_priv,
> >                       continue;
> >               }
> > 
> > +             prev_busy_frontbuffer_bits = intel_dp-
> > > psr.busy_frontbuffer_bits;
> >               pipe_frontbuffer_bits &=
> >                       INTEL_FRONTBUFFER_ALL_MASK(intel_dp->psr.pipe);
> >               intel_dp->psr.busy_frontbuffer_bits |=
> > pipe_frontbuffer_bits;
> > 
> >               if (pipe_frontbuffer_bits)
> > -                     intel_psr_exit(intel_dp);
> > +                     _psr_invalidate_handle(intel_dp,
> > prev_busy_frontbuffer_bits);
> > 
> >               mutex_unlock(&intel_dp->psr.lock);
> >       }
> > @@ -2195,6 +2234,49 @@ tgl_dc3co_flush_locked(struct intel_dp
> > *intel_dp, unsigned int frontbuffer_bits,
> >                        intel_dp->psr.dc3co_exit_delay);
> >  }
> > 
> > +static void _psr_flush_handle(struct intel_dp *intel_dp,
> > +                           unsigned int prev_busy_frontbuffer_bits)
> > +{
> > +     struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > +
> > +     if (intel_dp->psr.psr2_sel_fetch_enabled) {
> > +             /* is continuos full frame enabled? */
> > +             if (prev_busy_frontbuffer_bits) {
> > +                     /* it is, can we turn it off? */
> 
> As you are using this prev_busy_frontbuffer_bits only to check if
> "continuos full frame" is enabled and nothing else: Maybe you could
> just name it as it is e.g. bool cff_enabled or bool
> continuous_full_frame or...same comment in _psr_invalidate_handle. This
> would allow you to drop couple of comments.

prev_busy_frontbuffer_bits or a bool anyway works for me, if you think is easier to understand I can change to bool.

> 
> > +                     if (intel_dp->psr.busy_frontbuffer_bits == 0) {
> > +                             u32 clear =
> > man_trk_ctl_continuos_full_frame(dev_priv);
> > +                             u32 set =
> > man_trk_ctl_single_full_frame_bit_get(dev_priv) |
> > +                                       man_trk_ctl_partial_frame_bit
> > _get(dev_priv);
> 
> 
> > +
> > +                             /*
> > +                              * turn continuos full frame off and do
> > a single
> > +                              * full frame
> > +                              */
> > +                             intel_de_rmw(dev_priv,
> > +                                          PSR2_MAN_TRK_CTL(intel_dp-
> > > psr.transcoder),
> > +                                          clear, set);
> > +                             intel_de_write(dev_priv,
> > CURSURFLIVE(intel_dp->psr.pipe), 0);
> > +                     }
> > +             } else {
> > +                     /*
> > +                      * continuos full frame is disabled, only a
> > single full
> > +                      * frame is required
> > +                      */
> > +                     psr_force_hw_tracking_exit(intel_dp);
> > +             }
> > +     } else {
> > +             /*
> > +              * if prev_busy_frontbuffer_bits is set, it means that
> > PSR is
> > +              * inactive
> > +              */
> > +             if (prev_busy_frontbuffer_bits == 0)
> > +                     psr_force_hw_tracking_exit(intel_dp);
> > +
> > +             if (!intel_dp->psr.active && !intel_dp-
> > > psr.busy_frontbuffer_bits)
> > +                     schedule_work(&intel_dp->psr.work);
> > +     }
> > +}
> > +
> >  /**
> >   * intel_psr_flush - Flush PSR
> >   * @dev_priv: i915 device
> > @@ -2216,6 +2298,7 @@ void intel_psr_flush(struct drm_i915_private
> > *dev_priv,
> >       for_each_intel_encoder_with_psr(&dev_priv->drm, encoder) {
> >               unsigned int pipe_frontbuffer_bits = frontbuffer_bits;
> >               struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
> > +             unsigned int prev_busy_frontbuffer_bits;
> > 
> >               mutex_lock(&intel_dp->psr.lock);
> >               if (!intel_dp->psr.enabled) {
> > @@ -2223,6 +2306,7 @@ void intel_psr_flush(struct drm_i915_private
> > *dev_priv,
> >                       continue;
> >               }
> > 
> > +             prev_busy_frontbuffer_bits = intel_dp-
> > > psr.busy_frontbuffer_bits;
> >               pipe_frontbuffer_bits &=
> >                       INTEL_FRONTBUFFER_ALL_MASK(intel_dp->psr.pipe);
> >               intel_dp->psr.busy_frontbuffer_bits &=
> > ~pipe_frontbuffer_bits;
> > @@ -2232,25 +2316,22 @@ void intel_psr_flush(struct drm_i915_private
> > *dev_priv,
> >                * we have to ensure that the PSR is not activated
> > until
> >                * intel_psr_resume() is called.
> >                */
> > -             if (intel_dp->psr.paused) {
> > -                     mutex_unlock(&intel_dp->psr.lock);
> > -                     continue;
> > -             }
> > +             if (intel_dp->psr.paused)
> > +                     goto exit;
> > 
> >               if (origin == ORIGIN_FLIP ||
> >                   (origin == ORIGIN_CURSOR_UPDATE &&
> >                    !intel_dp->psr.psr2_sel_fetch_enabled)) {
> >                       tgl_dc3co_flush_locked(intel_dp,
> > frontbuffer_bits, origin);
> > -                     mutex_unlock(&intel_dp->psr.lock);
> > -                     continue;
> > +                     goto exit;
> >               }
> > 
> > -             /* By definition flush = invalidate + flush */
> > -             if (pipe_frontbuffer_bits)
> > -                     psr_force_hw_tracking_exit(intel_dp);
> > +             if (pipe_frontbuffer_bits == 0)
> > +                     goto exit;
> > 
> > -             if (!intel_dp->psr.active && !intel_dp-
> > > psr.busy_frontbuffer_bits)
> > -                     schedule_work(&intel_dp->psr.work);
> > +             /* By definition flush = invalidate + flush */
> > +             _psr_flush_handle(intel_dp,
> > prev_busy_frontbuffer_bits);
> > +exit:
> 
> I think you should name it as unlock.
> 
> >               mutex_unlock(&intel_dp->psr.lock);
> >       }
> >  }
> 
> BR,
> 
> Jouni Högander
>