[Intel-gfx] [PATCH 2/2] drm/i915/display/psr: Use continuos full frame to handle frontbuffer invalidations

Wed Mar 30 06:23:36 UTC 2022

Hello Jose,

Thank you for your response. I'm sorry for the delay on my side. See my
comment below.

On Fri, 2022-03-25 at 14:46 +0000, Souza, Jose wrote:
> On Fri, 2022-03-25 at 14:21 +0000, Hogander, Jouni wrote:
> > Hello Jose,
> > 
> > See my comments below.
> > 
> > On Thu, 2022-03-24 at 11:13 -0700, José Roberto de Souza wrote:
> > > Instead of exit PSR when a frontbuffer invalidation happens, we
> > > can
> > > enable the PSR2 selective fetch continuous full frame, that will
> > > keep
> > > the panel updated like PSR was disabled but without keeping PSR
> > > active.
> > 
> > with keeping PSR active? I don't think it's like PSR was disabled.
> > New
> > full frame is updated only via atomic commit. Having PSR disabled
> > new
> > full frame is updated all the time as PSR wasn't existing at all.
> > 
> > > So as soon as the frontbuffer flush happens we can disable the
> > > continuous full frame and start to do selective fetches much
> > > quicker
> > > than the path that would enable PSR, that will wait a few frames
> > > to actually activate PSR.
> > > 
> > > Also this approach has proven to fix some glitches found in
> > > Alderlake-P
> > > when there are a lot of invalidations happening together with
> > > page
> > > flips.
> > > 
> > > Some may ask why it is writing to CURSURFLIVE(), it is because
> > > that is the way that hardware team provided us to poke display to
> > > handle PSR updates, and it is being used since display 9.
> > 
> > Generic comments:
> > 
> > Current logic is to disable psr2 in invalidate callback and start
> > sending fullframe updates on every vblank period. This is done
> > until
> > flush callback where psr2 is re-enabled. Intention is to update
> > possible frontbuffer writes between invalidate/flush instantly.
> > 
> > Now you are changing the logic to update one full frame when
> 
> It is not enabling the one full frame, it is enabling the continuous
> full frame so at every vblank panel will be updated until this bit
> cleared.

There is comment in bspec stating otherwise (55229). I think it is
important to clarify so that we really understand what this WA is doing
and to avoid hiding the issue just for this specific case.

Currently I'm suspecting that "SF Continuous Full Frame" just
configures next selective update to full frame and then write to
CURSURFLIVE triggers flip which is sending that full frame (only one
frame). Updates between invalidate and flush are just depending if
there are flips in between. After getting this clarified we can
properly review this WA patch.

> 
> > frontbuffer write starts (_psr_invalidate_handle) and another one
> > when
> > it stops (_psr_flush_handle) without disabling psr at all. Have I
> > understood your patch correctly?
> > 
> > Propably we wont notice this change as we have these
> > invalidate/flush
> > calls scattered around in the code. Also parallel atomic commits
> > are
> > triggering updates. In theory we could observe latency in updates
> > between invalidate/flush? Do we care? What do you think?
> > 
> > Do we need to send update in invalidate at all? Isn't that usually
> > called before doing any frontbuffer writing? I.e. we would be
> > sending
> > frame that is already in RFB?
> > 
> > > Cc: Khaled Almahallawy <khaled.almahallawy at intel.com>
> > > Cc: Shawn C Lee <shawn.c.lee at intel.com>
> > > Cc: Jouni Högander <jouni.hogander at intel.com>
> > > Cc: Mika Kahola <mika.kahola at intel.com>
> > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_psr.c | 109
> > > ++++++++++++++++++++-
> > > --
> > >  1 file changed, 95 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/display/intel_psr.c
> > > b/drivers/gpu/drm/i915/display/intel_psr.c
> > > index d87b357806c91..f7b7b374374b1 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_psr.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_psr.c
> > > @@ -1450,6 +1450,22 @@ static inline u32
> > > man_trk_ctl_partial_frame_bit_get(struct drm_i915_private *dev
> > >              PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE;
> > >  }
> > > 
> > > +static inline u32 man_trk_ctl_continuos_full_frame(struct
> > > drm_i915_private *dev_priv)
> > > +{
> > > +     return IS_ALDERLAKE_P(dev_priv) ?
> > > +            ADLP_PSR2_MAN_TRK_CTL_SF_CONTINUOS_FULL_FRAME :
> > > +            PSR2_MAN_TRK_CTL_SF_CONTINUOS_FULL_FRAME;
> > > +}
> > > +
> > > +static inline u32 man_trk_ctl_su_region_start_end_mask(struct
> > > drm_i915_private *dev_priv)
> > > +{
> > > +     if (IS_ALDERLAKE_P(dev_priv))
> > > +             return
> > > ADLP_PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR_MASK
> > > +                    ADLP_PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MAS
> > > K;
> > > +     return PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR_MASK |
> > > +            PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MASK;
> > > +}
> > > +
> > >  static void psr_force_hw_tracking_exit(struct intel_dp
> > > *intel_dp)
> > >  {
> > >       struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > > @@ -1546,8 +1562,9 @@ void
> > > intel_psr2_program_trans_man_trk_ctl(const
> > > struct intel_crtc_state *crtc_st
> > >       if (!crtc_state->enable_psr2_sel_fetch)
> > >               return;
> > > 
> > > -     intel_de_write(dev_priv, PSR2_MAN_TRK_CTL(crtc_state-
> > > > cpu_transcoder),
> > > -                    crtc_state->psr2_man_track_ctl);
> > > +     intel_de_rmw(dev_priv, PSR2_MAN_TRK_CTL(crtc_state-
> > > > cpu_transcoder),
> > > +                  man_trk_ctl_su_region_start_end_mask(dev_priv)
> > > ,
> > > +                  crtc_state->psr2_man_track_ctl);
> > 
> > Should we actually now consider taking psr->lock here?
> 
> I don't think we need. mmio writes are syncronized, this will never
> set continuous full frame and will only clear su region start and
> end.
> Also this function is called from a time sensitive section if we
> spend too much time here it will cause vblank evasion warnings.
> 

Ok, how about hitting flush/invalidate in between read/write in
intel_de_rmw? Is that possible?

> > >  }
> > > 
> > >  static void psr2_man_trk_ctl_calc(struct intel_crtc_state
> > > *crtc_state,
> > > @@ -2127,6 +2144,26 @@ static void intel_psr_work(struct
> > > work_struct
> > > *work)
> > >       mutex_unlock(&intel_dp->psr.lock);
> > >  }
> > > 
> > > +static void _psr_invalidate_handle(struct intel_dp *intel_dp,
> > > +                                unsigned int
> > > prev_busy_frontbuffer_bits)
> > > +{
> > > +     struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > > +
> > > +     if (intel_dp->psr.psr2_sel_fetch_enabled) {
> > > +             u32 val =
> > > man_trk_ctl_continuos_full_frame(dev_priv) |
> > > +                       man_trk_ctl_partial_frame_bit_get(dev_pri
> > > v);
> > > +
> > > +             /* continuos full frame is already enabled */
> > > +             if (prev_busy_frontbuffer_bits)
> > > +                     return;
> > 
> > Should we still trigger the update using CURSURFLIVE? Or do we need
> > that at all in the first place?
> 
> *continuos full frame*
> 
> > > +
> > > +             intel_de_rmw(dev_priv, PSR2_MAN_TRK_CTL(intel_dp-
> > > > psr.transcoder), 0, val);
> > > +             intel_de_write(dev_priv, CURSURFLIVE(intel_dp-
> > > > psr.pipe), 0);
> > 
> > So these two register writes here are triggering one full frame
> > update.
> > and leaving full frame update bit set so that coming updates are
> > also
> > full frame. Did I understood it correctly?
> 
> *continuos full frame*
> 
> > As invalidate is called before frontbuffer is writen: Isn't it
> > actually
> > re-updating same frame which is already supposed to be in panel
> > RFB?
> > 
> > > +     } else {
> > > +             intel_psr_exit(intel_dp);
> > > +     }
> > > +}
> > > +
> > >  /**
> > >   * intel_psr_invalidate - Invalidade PSR
> > >   * @dev_priv: i915 device
> > > @@ -2151,6 +2188,7 @@ void intel_psr_invalidate(struct
> > > drm_i915_private *dev_priv,
> > >       for_each_intel_encoder_with_psr(&dev_priv->drm, encoder) {
> > >               unsigned int pipe_frontbuffer_bits =
> > > frontbuffer_bits;
> > >               struct intel_dp *intel_dp =
> > > enc_to_intel_dp(encoder);
> > > +             unsigned int prev_busy_frontbuffer_bits;
> > > 
> > >               mutex_lock(&intel_dp->psr.lock);
> > >               if (!intel_dp->psr.enabled) {
> > > @@ -2158,12 +2196,13 @@ void intel_psr_invalidate(struct
> > > drm_i915_private *dev_priv,
> > >                       continue;
> > >               }
> > > 
> > > +             prev_busy_frontbuffer_bits = intel_dp-
> > > > psr.busy_frontbuffer_bits;
> > >               pipe_frontbuffer_bits &=
> > >                       INTEL_FRONTBUFFER_ALL_MASK(intel_dp-
> > > >psr.pipe);
> > >               intel_dp->psr.busy_frontbuffer_bits |=
> > > pipe_frontbuffer_bits;
> > > 
> > >               if (pipe_frontbuffer_bits)
> > > -                     intel_psr_exit(intel_dp);
> > > +                     _psr_invalidate_handle(intel_dp,
> > > prev_busy_frontbuffer_bits);
> > > 
> > >               mutex_unlock(&intel_dp->psr.lock);
> > >       }
> > > @@ -2195,6 +2234,49 @@ tgl_dc3co_flush_locked(struct intel_dp
> > > *intel_dp, unsigned int frontbuffer_bits,
> > >                        intel_dp->psr.dc3co_exit_delay);
> > >  }
> > > 
> > > +static void _psr_flush_handle(struct intel_dp *intel_dp,
> > > +                           unsigned int
> > > prev_busy_frontbuffer_bits)
> > > +{
> > > +     struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);
> > > +
> > > +     if (intel_dp->psr.psr2_sel_fetch_enabled) {
> > > +             /* is continuos full frame enabled? */
> > > +             if (prev_busy_frontbuffer_bits) {
> > > +                     /* it is, can we turn it off? */
> > 
> > As you are using this prev_busy_frontbuffer_bits only to check if
> > "continuos full frame" is enabled and nothing else: Maybe you could
> > just name it as it is e.g. bool cff_enabled or bool
> > continuous_full_frame or...same comment in _psr_invalidate_handle.
> > This
> > would allow you to drop couple of comments.
> 
> prev_busy_frontbuffer_bits or a bool anyway works for me, if you
> think is easier to understand I can change to bool.

To my opinion it actually helps understanding the code. This was just
idea. I'm not insisting to change if you think current one is better.

> 
> > > +                     if (intel_dp->psr.busy_frontbuffer_bits ==
> > > 0) {
> > > +                             u32 clear =
> > > man_trk_ctl_continuos_full_frame(dev_priv);
> > > +                             u32 set =
> > > man_trk_ctl_single_full_frame_bit_get(dev_priv) |
> > > +                                       man_trk_ctl_partial_frame
> > > _bit
> > > _get(dev_priv);
> > > +
> > > +                             /*
> > > +                              * turn continuos full frame off
> > > and do
> > > a single
> > > +                              * full frame
> > > +                              */
> > > +                             intel_de_rmw(dev_priv,
> > > +                                          PSR2_MAN_TRK_CTL(intel
> > > _dp-
> > > > psr.transcoder),
> > > +                                          clear, set);
> > > +                             intel_de_write(dev_priv,
> > > CURSURFLIVE(intel_dp->psr.pipe), 0);
> > > +                     }
> > > +             } else {
> > > +                     /*
> > > +                      * continuos full frame is disabled, only a
> > > single full
> > > +                      * frame is required
> > > +                      */
> > > +                     psr_force_hw_tracking_exit(intel_dp);
> > > +             }
> > > +     } else {
> > > +             /*
> > > +              * if prev_busy_frontbuffer_bits is set, it means
> > > that
> > > PSR is
> > > +              * inactive
> > > +              */
> > > +             if (prev_busy_frontbuffer_bits == 0)
> > > +                     psr_force_hw_tracking_exit(intel_dp);
> > > +
> > > +             if (!intel_dp->psr.active && !intel_dp-
> > > > psr.busy_frontbuffer_bits)
> > > +                     schedule_work(&intel_dp->psr.work);
> > > +     }
> > > +}
> > > +
> > >  /**
> > >   * intel_psr_flush - Flush PSR
> > >   * @dev_priv: i915 device
> > > @@ -2216,6 +2298,7 @@ void intel_psr_flush(struct
> > > drm_i915_private
> > > *dev_priv,
> > >       for_each_intel_encoder_with_psr(&dev_priv->drm, encoder) {
> > >               unsigned int pipe_frontbuffer_bits =
> > > frontbuffer_bits;
> > >               struct intel_dp *intel_dp =
> > > enc_to_intel_dp(encoder);
> > > +             unsigned int prev_busy_frontbuffer_bits;
> > > 
> > >               mutex_lock(&intel_dp->psr.lock);
> > >               if (!intel_dp->psr.enabled) {
> > > @@ -2223,6 +2306,7 @@ void intel_psr_flush(struct
> > > drm_i915_private
> > > *dev_priv,
> > >                       continue;
> > >               }
> > > 
> > > +             prev_busy_frontbuffer_bits = intel_dp-
> > > > psr.busy_frontbuffer_bits;
> > >               pipe_frontbuffer_bits &=
> > >                       INTEL_FRONTBUFFER_ALL_MASK(intel_dp-
> > > >psr.pipe);
> > >               intel_dp->psr.busy_frontbuffer_bits &=
> > > ~pipe_frontbuffer_bits;
> > > @@ -2232,25 +2316,22 @@ void intel_psr_flush(struct
> > > drm_i915_private
> > > *dev_priv,
> > >                * we have to ensure that the PSR is not activated
> > > until
> > >                * intel_psr_resume() is called.
> > >                */
> > > -             if (intel_dp->psr.paused) {
> > > -                     mutex_unlock(&intel_dp->psr.lock);
> > > -                     continue;
> > > -             }
> > > +             if (intel_dp->psr.paused)
> > > +                     goto exit;
> > > 
> > >               if (origin == ORIGIN_FLIP ||
> > >                   (origin == ORIGIN_CURSOR_UPDATE &&
> > >                    !intel_dp->psr.psr2_sel_fetch_enabled)) {
> > >                       tgl_dc3co_flush_locked(intel_dp,
> > > frontbuffer_bits, origin);
> > > -                     mutex_unlock(&intel_dp->psr.lock);
> > > -                     continue;
> > > +                     goto exit;
> > >               }
> > > 
> > > -             /* By definition flush = invalidate + flush */
> > > -             if (pipe_frontbuffer_bits)
> > > -                     psr_force_hw_tracking_exit(intel_dp);
> > > +             if (pipe_frontbuffer_bits == 0)
> > > +                     goto exit;
> > > 
> > > -             if (!intel_dp->psr.active && !intel_dp-
> > > > psr.busy_frontbuffer_bits)
> > > -                     schedule_work(&intel_dp->psr.work);
> > > +             /* By definition flush = invalidate + flush */
> > > +             _psr_flush_handle(intel_dp,
> > > prev_busy_frontbuffer_bits);
> > > +exit:
> > 
> > I think you should name it as unlock.
> > 
> > >               mutex_unlock(&intel_dp->psr.lock);
> > >       }
> > >  }
> > 
> > BR,
> > 
> > Jouni Högander
> >