[Intel-gfx] [PATCH v12 1/2] drm/i915: Refactor intel_can_enable_sagv

Wed Nov 20 09:58:07 UTC 2019

On Tue, 2019-11-19 at 15:13 -0800, Matt Roper wrote:
> On Fri, Nov 15, 2019 at 04:54:00PM +0200, Stanislav Lisovskiy wrote:
> > Currently intel_can_enable_sagv function contains
> > a mix of workarounds for different platforms
> > some of them are not valid for gens >= 11 already,
> > so lets split it into separate functions.
> > 
> > v2:
> >     - Rework watermark calculation algorithm to
> >       attempt to calculate Level 0 watermark
> >       with added sagv block time latency and
> >       check if it fits in DBuf in order to
> >       determine if SAGV can be enabled already
> >       at this stage, just as BSpec 49325 states.
> >       if that fails rollback to usual Level 0
> >       latency and disable SAGV.
> >     - Remove unneeded tabs(James Ausmus)
> > 
> > v3: Rebased the patch
> > 
> > v4: - Added back interlaced check for Gen12 and
> >       added separate function for TGL SAGV check
> >       (thanks to James Ausmus for spotting)
> >     - Removed unneeded gen check
> >     - Extracted Gen12 SAGV decision making code
> >       to a separate function from skl_compute_wm
> > 
> > v5: - Added SAGV global state to dev_priv, because
> >       we need to track all pipes, not only those
> >       in atomic state. Each pipe has now correspondent
> >       bit mask reflecting, whether it can tolerate
> >       SAGV or not(thanks to Ville Syrjala for suggestions).
> >     - Now using active flag instead of enable in crc
> >       usage check.
> > 
> > v6: - Fixed rebase conflicts
> > 
> > v7: - kms_cursor_legacy seems to get broken because of multiple
> > memcpy
> >       calls when copying level 0 water marks for enabled SAGV, to
> >       fix this now simply using that field right away, without
> > copying,
> >       for that introduced a new wm_level accessor which decides
> > which
> >       wm_level to return based on SAGV state.
> > 
> > v8: - Protect crtc_sagv_mask same way as we do for other global
> > state
> >       changes: i.e check if changes are needed, then grab all crtc
> > locks
> >       to serialize the changes.
> > 
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
> > Cc: Ville Syrjälä <ville.syrjala at intel.com>
> > Cc: James Ausmus <james.ausmus at intel.com>

Hi Matt,

Thanks for really valid comments. I should mention that currently 
I'm mostly tried to figure out how to do it properly as current
way we serialize commits seems to be a bit problematic.

I.e when I detect that I need to change a mask which stores which
pipes tolerate SAGV, according to current Ville's paradigm I should
grab all the crtcs, that locking the global state that way and
serializing access, preventing contention which might occur if
different commits read global state and modify different crtcs
at the same time.
However in CI I get comlains then like:

 WARNING: CPU: 6 PID: 1084 at drivers/gpu/drm/drm_modeset_lock.c:228
drm_modeset_drop_locks+0x35/0x40
<4> [369.766202] Modules linked in: vgem snd_hda_codec_hdmi mei_hdcp
i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg cdc_ether usbnet
snd_hda_codec mii snd_hwdep snd_hda_core snd_pcm mei_me mei
prime_numbers
<4> [369.766225] RIP: 0010:drm_modeset_drop_locks+0x35/0x40
<4> [369.766265]  drm_mode_cursor_common+0xf3/0x230
<4> [369.766273]  ? drm_mode_setplane+0x190/0x190
<4> [369.766275]  drm_mode_cursor_ioctl+0x48/0x70
<4> [369.766280]  drm_ioctl_kernel+0xa7/0xf0
<4> [369.766283]  drm_ioctl+0x2e1/0x390
<4> [369.766287]  ? drm_mode_setplane+0x190/0x190

Which means that WARN happens when EDADLK(possible deadlock happens)

This most likely happens because for example if there are
two racing commits:

Commit 1            Commit 2          Global state in dev_priv:

locked crtc 0        locked crtc1        00 (SAGV is off for both)

reads global state
  as 00
                     reads global state
	                as 00
figures out
that SAGV is 
ok for pipe 0
tries lock the global
   state
(bails out as other
 crtc is locked,
according to WW mutex
algorithm commit starts 
from begining)
                          ...

Guess this happens for both one of those manages grab all the locks,
so basically current way serializing commits seems to be wrong
if there is a real intense contention. Or am I missing something here?

> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c  |  12 +-
> >  .../drm/i915/display/intel_display_types.h    |  15 +
> >  drivers/gpu/drm/i915/i915_drv.h               |   6 +
> >  drivers/gpu/drm/i915/intel_pm.c               | 418
> > ++++++++++++++++--
> >  drivers/gpu/drm/i915/intel_pm.h               |   1 +
> >  5 files changed, 409 insertions(+), 43 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> > b/drivers/gpu/drm/i915/display/intel_display.c
> > index adf50c4b38ad..7f31e33d0b16 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -13401,7 +13401,10 @@ static void verify_wm_state(struct
> > intel_crtc *crtc,
> >  		/* Watermarks */
> >  		for (level = 0; level <= max_level; level++) {
> >  			if (skl_wm_level_equals(&hw_plane_wm-
> > >wm[level],
> > -						&sw_plane_wm-
> > >wm[level]))
> > +						&sw_plane_wm-
> > >wm[level]) ||
> > +			   (skl_wm_level_equals(&hw_plane_wm-
> > >wm[level],
> 
> If we cache the result of 'can enable sagv' into the state structure
> (as
> I suggest farther down the patch) then we can just compare with the
> right value here rather than trying both.

Could be, however my concern was that hw state might not always match
our current state, however as we seem to do that check already after
we commit the values, probably yes we can optimize it that way.

> 
> > +						&sw_plane_wm->sagv_wm0) 
> > &&
> > +			   (level == 0)))
> >  				continue;
> >  
> >  			DRM_ERROR("mismatch in WM pipe %c plane %d
> > level %d (expected e=%d b=%u l=%u, got e=%d b=%u l=%u)\n",
> > @@ -13453,7 +13456,10 @@ static void verify_wm_state(struct
> > intel_crtc *crtc,
> >  		/* Watermarks */
> >  		for (level = 0; level <= max_level; level++) {
> >  			if (skl_wm_level_equals(&hw_plane_wm-
> > >wm[level],
> > -						&sw_plane_wm-
> > >wm[level]))
> > +						&sw_plane_wm-
> > >wm[level]) ||
> > +			   (skl_wm_level_equals(&hw_plane_wm-
> > >wm[level],
> > +						&sw_plane_wm->sagv_wm0) 
> > &&
> > +			   (level == 0)))
> >  				continue;
> >  
> >  			DRM_ERROR("mismatch in WM pipe %c cursor level
> > %d (expected e=%d b=%u l=%u, got e=%d b=%u l=%u)\n",
> > @@ -14863,6 +14869,8 @@ static void intel_atomic_commit_tail(struct
> > intel_atomic_state *state)
> >  							      new_crtc_
> > state);
> >  	}
> >  
> > +	dev_priv->crtc_sagv_mask = state->crtc_sagv_mask;
> > +
> >  	for_each_oldnew_intel_crtc_in_state(state, crtc,
> > old_crtc_state, new_crtc_state, i) {
> >  		intel_post_plane_update(old_crtc_state);
> >  
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h
> > b/drivers/gpu/drm/i915/display/intel_display_types.h
> > index 83ea04149b77..6a300cac883f 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_types.h
> > +++ b/drivers/gpu/drm/i915/display/intel_display_types.h
> > @@ -490,6 +490,20 @@ struct intel_atomic_state {
> >  	 */
> >  	u8 active_pipe_changes;
> >  
> > +	/*
> > +	 * Contains a mask which reflects whether correspondent pipe
> > +	 * can tolerate SAGV or not, so that we can make a decision
> > +	 * at atomic_commit_tail stage, whether we enable it or not
> > +	 * based on global state in dev_priv.
> > +	 */
> > +	u32 crtc_sagv_mask;
> 
> I feel like your code might flow a bit more naturally if this were
> inverted and used as a mask of CRTCs that currently prohibit SAGV?

I think it should be fine both ways.

> 
> > +
> > +	/*
> > +	 * Used to determine if the mask has been already calculated
> > +	 * for this state, to avoid unnecessary calculations.
> > +	 */
> > +	bool crtc_sagv_mask_set;
> 
> I think this field can go away too if we just call the function once
> and
> cache the result in the state field.

Yes I think it can be cached. Need again still to rule out possible
complications from simultaneous global state mask modification from
different commits with different crtcs, as in fact it is global state
which determines if SAGV can be enabled or not. As current state
only might contain some crtcs, but not all.

> 
> > +
> >  	u8 active_pipes;
> >  	/* minimum acceptable cdclk for each pipe */
> >  	int min_cdclk[I915_MAX_PIPES];
> > @@ -670,6 +684,7 @@ struct skl_plane_wm {
> >  	struct skl_wm_level wm[8];
> >  	struct skl_wm_level uv_wm[8];
> >  	struct skl_wm_level trans_wm;
> > +	struct skl_wm_level sagv_wm0;
> >  	bool is_planar;
> >  };
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 1779f600fcfb..0ac9d7b006ca 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1171,6 +1171,12 @@ struct drm_i915_private {
> >  
> >  	u32 sagv_block_time_us;
> >  
> > +	/*
> > +	 * Contains a bit mask, whether correspondent
> > +	 * pipe allows SAGV or not.
> > +	 */
> > +	u32 crtc_sagv_mask;
> > +
> >  	struct {
> >  		/*
> >  		 * Raw watermark latency values:
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > b/drivers/gpu/drm/i915/intel_pm.c
> > index 05ba9e1bd247..c914bd1862ba 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3625,13 +3625,9 @@ static bool skl_needs_memory_bw_wa(struct
> > drm_i915_private *dev_priv)
> >  	return IS_GEN9_BC(dev_priv) || IS_BROXTON(dev_priv);
> >  }
> >  
> > -static bool
> > +bool
> >  intel_has_sagv(struct drm_i915_private *dev_priv)
> >  {
> > -	/* HACK! */
> > -	if (IS_GEN(dev_priv, 12))
> > -		return false;
> > -
> 
> The SAGV work you're doing is pretty complicated and this general
> patch
> touches a lot of different platforms (SKL, ICL, TGL, etc.).  It would
> be
> great if we could break this up into a few patches, but if that's not
> easy, I'd suggest at least moving this specific change to a final
> patch
> all of its own so that we "flip the switch" on TGL independently of
> the
> general rework.  That way if we wind up with TGL regressions (but no
> problems on SKL/ICL) we can just revert a tiny 2-line patch rather
> than
> reverting _all_ of your work here.

Agree, the amount of issues which arise or might arise is skyrocketing.
Probably need to split it into smaller chunks, however still prefer
that some crucial related stuff is modified in the same patch,
otherwise this gets really hard to track.

> 
> >  	return (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) &&
> >  		dev_priv->sagv_status != I915_SAGV_NOT_CONTROLLED;
> >  }
> > @@ -3748,7 +3744,7 @@ intel_disable_sagv(struct drm_i915_private
> > *dev_priv)
> >  	return 0;
> >  }
> >  
> > -bool intel_can_enable_sagv(struct intel_atomic_state *state)
> > +static void skl_set_sagv_mask(struct intel_atomic_state *state)
> 
> Minor nitpick:  I know there's a lot of inconsistent terminology used
> throughout the driver, but I always expect functions with names like
> 'set,' 'update,' etc. to touch hardware somehow.  I prefer naming
> functions like this with verbs like 'compute' or 'calc' to make it a
> bit
> more clear (at least in my mind) that we're not doing anything here
> yet
> except analyzing the state.

Makes sense. Will rename it.

> 
> >  {
> >  	struct drm_device *dev = state->base.dev;
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> > @@ -3758,29 +3754,35 @@ bool intel_can_enable_sagv(struct
> > intel_atomic_state *state)
> >  	enum pipe pipe;
> >  	int level, latency;
> >  
> > +	if (state->crtc_sagv_mask_set)
> > +		return;
> > +
> >  	if (!intel_has_sagv(dev_priv))
> > -		return false;
> > +		return;
> 
> It seems like this check should just be at the top of
> intel_can_enable_sagv() rather than duplicated in each platform's
> mask-calculating function?

Agree.

> 
> >  
> >  	/*
> >  	 * If there are no active CRTCs, no additional checks need be
> > performed
> >  	 */
> >  	if (hweight8(state->active_pipes) == 0)
> > -		return true;
> > +		return;
> 
> This also appears to be a platform-independent check that can move up
> to
> the intel_can_enable_sagv() level?  You don't have it in the TGL
> function right now, but it seems like it should apply there as well?

Agree.

> 
> >  
> >  	/*
> >  	 * SKL+ workaround: bspec recommends we disable SAGV when we
> > have
> >  	 * more then one pipe enabled
> >  	 */
> >  	if (hweight8(state->active_pipes) > 1)
> > -		return false;
> > +		return;
> >  
> >  	/* Since we're now guaranteed to only have one active CRTC...
> > */
> >  	pipe = ffs(state->active_pipes) - 1;
> >  	crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
> >  	crtc_state = to_intel_crtc_state(crtc->base.state);
> > +	state->crtc_sagv_mask &= ~BIT(crtc->pipe);
> >  
> > -	if (crtc_state->hw.adjusted_mode.flags &
> > DRM_MODE_FLAG_INTERLACE)
> > -		return false;
> > +	if (crtc_state->hw.adjusted_mode.flags &
> > DRM_MODE_FLAG_INTERLACE) {
> > +		state->crtc_sagv_mask_set = true;
> > +		return;
> > +	}
> >  
> >  	for_each_intel_plane_on_crtc(dev, crtc, plane) {
> >  		struct skl_plane_wm *wm =
> 
> In the pre-existing code for this loop (that doesn't show up in the
> diff
> here) it looks like we're looking at the already-committed plane
> state
> to see if the old plane FB was x-tiled...should that be looking at
> the new FB from the current state instead?

Hm.. I just copied that code for skl, need to figure this out.

> 
> > @@ -3807,7 +3809,135 @@ bool intel_can_enable_sagv(struct
> > intel_atomic_state *state)
> >  		 * incur memory latencies higher than
> > sagv_block_time_us we
> >  		 * can't enable SAGV.
> >  		 */
> > -		if (latency < dev_priv->sagv_block_time_us)
> > +		if (latency < dev_priv->sagv_block_time_us) {
> > +			state->crtc_sagv_mask_set = true;
> > +			return;
> > +		}
> > +	}
> > +
> > +	state->crtc_sagv_mask |= BIT(crtc->pipe);
> > +	state->crtc_sagv_mask_set = true;
> > +}
> > +
> > +static void tgl_set_sagv_mask(struct intel_atomic_state *state);
> > +
> > +static void icl_set_sagv_mask(struct intel_atomic_state *state)
> > +{
> > +	struct drm_device *dev = state->base.dev;
> > +	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	struct intel_crtc *crtc;
> > +	struct intel_crtc_state *new_crtc_state;
> > +	int level, latency;
> > +	int i;
> > +	int plane_id;
> > +
> > +	if (state->crtc_sagv_mask_set)
> > +		return;
> > +
> > +	if (!intel_has_sagv(dev_priv))
> > +		return;
> > +
> > +	/*
> > +	 * If there are no active CRTCs, no additional checks need be
> > performed
> > +	 */
> > +	if (hweight8(state->active_pipes) == 0)
> > +		return;
> > +
> > +	for_each_new_intel_crtc_in_state(state, crtc,
> > +					     new_crtc_state, i) {
> > +		unsigned int flags = crtc->base.state-
> > >adjusted_mode.flags;
> > +		bool can_sagv;
> > +
> > +		if (flags & DRM_MODE_FLAG_INTERLACE)
> > +			continue;
> > +
> > +		if (!new_crtc_state->hw.active)
> > +			continue;
> > +
> > +		can_sagv = true;
> > +		for_each_plane_id_on_crtc(crtc, plane_id) {
> > +			struct skl_plane_wm *wm =
> > +				&new_crtc_state-
> > >wm.skl.optimal.planes[plane_id];
> > +
> > +			/* Skip this plane if it's not enabled */
> > +			if (!wm->wm[0].plane_en)
> > +				continue;
> > +
> > +			/* Find the highest enabled wm level for this
> > plane */
> > +			for (level = ilk_wm_max_level(dev_priv);
> > +			     !wm->wm[level].plane_en; --level) {
> > +			}
> > +
> > +			latency = dev_priv->wm.skl_latency[level];
> > +
> > +			/*
> > +			 * If any of the planes on this pipe don't
> > enable
> > +			 * wm levels that incur memory latencies higher
> > than
> > +			 * sagv_block_time_us we can't enable SAGV.
> > +			 */
> > +			if (latency < dev_priv->sagv_block_time_us) {
> > +				can_sagv = false;
> > +				break;
> > +			}
> 
> I still think this test is a bit problematic.  What if our memory
> latency is so low that we can successfully enable all watermark
> levels,
> and the latency for the highest watermark level is still less than
> ICL's
> 10us sagv block time?  We might be able to support SAGV just fine,
> but
> we're giving up without actually checking.
> 
> Or another case:  say our highest enabled watermark level is 3 with a
> latency of 8us.  The next level up, 4, has a latency of 30us which is
> high enough that our driver had to disable level 4.  We still don't
> know
> whether the plane could have tolerated the latency of 10us (and
> there's
> a good chance we could have...level 4 in this example was only
> impossible because it was such a big latency jump over level 3).
> 
> BTW, as I mentioned before, I'm still a bit uncomfortable with the
> bspec
> wording here; I'm going to open a bspec defect to find out for sure
> how
> we should interpret the directions on gen11.

I agree this seems weird. Lets clarify this first and get fixed in
BSpec.

> 
> > +		}
> > +		if (can_sagv)
> > +			state->crtc_sagv_mask |= BIT(crtc->pipe);
> > +		else
> > +			state->crtc_sagv_mask &= ~BIT(crtc->pipe);
> > +	}
> > +	state->crtc_sagv_mask_set = true;
> > +}
> > +
> > +bool intel_can_enable_sagv(struct intel_atomic_state *state)
> > +{
> > +	struct drm_device *dev = state->base.dev;
> > +	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	int ret, i;
> > +	struct intel_crtc *crtc;
> > +	struct intel_crtc_state *new_crtc_state;
> > +
> > +	/*
> > +	 * Make sure we always pick global state first,
> > +	 * there shouldn't be any issue as we hold only locks
> > +	 * to correspondent crtcs in state, however once
> > +	 * we detect that we need to change SAGV mask
> > +	 * in global state, we will grab all the crtc locks
> > +	 * in order to get this serialized, thus other
> > +	 * racing commits having other crtc locks, will have
> > +	 * to start over again, as stated by Wound-Wait
> > +	 * algorithm.
> > +	 */
> > +	state->crtc_sagv_mask = dev_priv->crtc_sagv_mask;

probably we need to figure out solution here.

> > +
> > +	if (INTEL_GEN(dev_priv) >= 12)
> > +		tgl_set_sagv_mask(state);
> > +	else if (INTEL_GEN(dev_priv) == 11)
> > +		icl_set_sagv_mask(state);
> > +	else
> > +		skl_set_sagv_mask(state);
> > +
> > +	/*
> > +	 * For SAGV we need to account all the pipes,
> > +	 * not only the ones which are in state currently.
> > +	 * Grab all locks if we detect that we are actually
> > +	 * going to do something.
> > +	 */
> > +	if (state->crtc_sagv_mask != dev_priv->crtc_sagv_mask) {
> > +		ret = intel_atomic_serialize_global_state(state);
> > +		if (ret) {
> > +			DRM_DEBUG_KMS("Could not serialize global
> > state\n");
> > +			return false;
> > +		}
> > +	}
> > +
> > +	for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state,
> > i) {
> > +		u32 mask = BIT(crtc->pipe);
> > +		bool state_sagv_masked = (mask & state->crtc_sagv_mask) 
> > == 0;
> > +
> > +		if (!new_crtc_state->hw.active)
> > +			continue;
> > +
> > +		if (state_sagv_masked)
> >  			return false;
> >  	}
> >  
> > @@ -3933,6 +4063,7 @@ static int skl_compute_wm_params(const struct
> > intel_crtc_state *crtc_state,
> >  				 int color_plane);
> >  static void skl_compute_plane_wm(const struct intel_crtc_state
> > *crtc_state,
> >  				 int level,
> > +				 u32 latency,
> >  				 const struct skl_wm_params *wp,
> >  				 const struct skl_wm_level
> > *result_prev,
> >  				 struct skl_wm_level *result /* out
> > */);
> > @@ -3955,7 +4086,10 @@ skl_cursor_allocation(const struct
> > intel_crtc_state *crtc_state,
> >  	WARN_ON(ret);
> >  
> >  	for (level = 0; level <= max_level; level++) {
> > -		skl_compute_plane_wm(crtc_state, level, &wp, &wm, &wm);
> > +		u32 latency = dev_priv->wm.skl_latency[level];
> > +
> > +		skl_compute_plane_wm(crtc_state, level, latency, &wp,
> > &wm, &wm);
> > +
> >  		if (wm.min_ddb_alloc == U16_MAX)
> >  			break;
> >  
> > @@ -4220,6 +4354,98 @@ icl_get_total_relative_data_rate(struct
> > intel_crtc_state *crtc_state,
> >  	return total_data_rate;
> >  }
> >  
> > +static int
> > +tgl_check_pipe_fits_sagv_wm(struct intel_crtc_state *crtc_state,
> > +			    struct skl_ddb_allocation *ddb /* out */)
> > +{
> > +	struct drm_crtc *crtc = crtc_state->uapi.crtc;
> > +	struct drm_i915_private *dev_priv = to_i915(crtc->dev);
> > +	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > +	struct skl_ddb_entry *alloc = &crtc_state->wm.skl.ddb;
> > +	u16 alloc_size;
> > +	u16 total[I915_MAX_PLANES] = {};
> > +	u64 total_data_rate;
> > +	enum plane_id plane_id;
> > +	int num_active;
> > +	u64 plane_data_rate[I915_MAX_PLANES] = {};
> > +	u32 blocks;
> > +
> > +	/*
> > +	 * No need to check gen here, we call this only for gen12
> > +	 */
> > +	total_data_rate =
> > +		icl_get_total_relative_data_rate(crtc_state,
> > +						 plane_data_rate);
> > +
> > +	skl_ddb_get_pipe_allocation_limits(dev_priv, crtc_state,
> > +					   total_data_rate,
> > +					   ddb, alloc, &num_active);
> > +	alloc_size = skl_ddb_entry_size(alloc);
> > +	if (alloc_size == 0)
> > +		return -ENOSPC;
> > +
> > +	/* Allocate fixed number of blocks for cursor. */
> > +	total[PLANE_CURSOR] = skl_cursor_allocation(crtc_state,
> > num_active);
> > +	alloc_size -= total[PLANE_CURSOR];
> > +	crtc_state->wm.skl.plane_ddb_y[PLANE_CURSOR].start =
> > +		alloc->end - total[PLANE_CURSOR];
> > +	crtc_state->wm.skl.plane_ddb_y[PLANE_CURSOR].end = alloc->end;
> 
> All the above is basically a duplication of the pipe's DDB allocation
> we
> have to figure out again later.  Basically our driver used to follow
> a
> sequence of:
> 
>         * Calculate DDB (proportional algorithm)
>         * Calculate watermarks
> 
> and then we switched it to:
> 
>         * Calculate watermarks
>         * Calculate DDB (need-based algorithm)
> 
> If I recall correctly, the need-based DDB algorithm only really needs
> the watermark values to divy up the intra-pipe plane DDB allocations
> so
> we could still calculate the overall pipe allocations earlier if we
> wanted to.  Doing so would allow you to avoid this duplication of
> logic:
> 
>         * Calculate pipe-level DDB allocations
>         * Calculate watermarks
>         * Calculate plane-level DDB (need-based algorithm)

Yep, I had to take part of the ddb allocation code as we need to
figure out if we are actually fitting the SAGV watermarks(which are
higher) succesfully. And we need to do that before we actually do 
a real DBuf allocation.
Pipe level allocation simply gives each pipe a ddb entries in
proportion to pipe width, currently this is called simultaneously
when we are checking if plane ddb blocks are not exceeding the
alloc_size per pipe.

However for SAGV we need to know already which watermarks are we
fitting, because if SAGV watermarks are not fitting we have to
roll back and do everything from the begining.

> 
> > +
> > +	/*
> > +	 * Do check if we can fit L0 + sagv_block_time and
> > +	 * disable SAGV if we can't.
> > +	 */
> > +	blocks = 0;
> > +	for_each_plane_id_on_crtc(intel_crtc, plane_id) {
> > +		const struct skl_plane_wm *wm =
> > +			&crtc_state->wm.skl.optimal.planes[plane_id];
> > +
> > +		if (plane_id == PLANE_CURSOR) {
> > +			if (WARN_ON(wm->sagv_wm0.min_ddb_alloc >
> > +				    total[PLANE_CURSOR])) {
> > +				blocks = U32_MAX;
> > +				break;
> > +			}
> > +			continue;
> > +		}
> > +
> > +		blocks += wm->sagv_wm0.min_ddb_alloc;
> > +		if (blocks > alloc_size)
> > +			return -ENOSPC;
> > +	}
> > +	return 0;
> > +}
> > +
> > +const struct skl_wm_level *
> > +skl_plane_wm_level(struct intel_plane *plane,
> > +		const struct intel_crtc_state *crtc_state,
> > +		int level,
> > +		bool yuv)
> > +{
> > +	struct drm_atomic_state *state = crtc_state->uapi.state;
> > +	enum plane_id plane_id = plane->id;
> > +	const struct skl_plane_wm *wm =
> > +		&crtc_state->wm.skl.optimal.planes[plane_id];
> > +
> > +	/*
> > +	 * Looks ridicilous but need to check if state is not
> > +	 * NULL here as it might be as some cursor plane manipulations
> > +	 * seem to happen when no atomic state is actually present,
> > +	 * despite crtc_state is allocated. Removing state check
> > +	 * from here will result in kernel panic on boot.
> > +	 * However we now need to check whether should be use SAGV
> > +	 * wm levels here.
> > +	 */
> > +	if (state) {
> > +		struct intel_atomic_state *intel_state =
> > +			to_intel_atomic_state(state);
> > +		if (intel_can_enable_sagv(intel_state) && !level)
> 
> I think we should calculate the 'can enable SAGV' value once and
> cache
> it into a field of the state structure so that you don't have to keep
> re-calling this on every single plane/level combination.  Also you
> can
> then use the proper setting to figure out how to verify the hardware
> readout value as noted earlier. 
> 
> Also one thing I don't see (maybe I'm just overlooking it) is that we
> may also need to adjust higher watermark levels upward too:
> 
>         "The latency input to the watermark calculation for each
> level
>         must be greater than or equal to the lower level. The latency
>         increase to level 0 for SAGV requires the upper levels to be
>         adjusted to meet that requirement. Use MIN(latency for this
>         level, latency for next lower level) to correct the latency."
> 
> Basically it seems like we should just calculate two full sets of
> watermark values for all levels and then choose between them at the
> end?
> 

Exactly which again means that we need some function to check first
which set we should use(which is basically dictated if we can fit that
into DDB or not).

Stan

> 
> Matt
> 
> > +			return &wm->sagv_wm0;
> > +	}
> > +
> > +	return yuv ? &wm->uv_wm[level] : &wm->wm[level];
> > +}
> > +
> >  static int
> >  skl_allocate_pipe_ddb(struct intel_crtc_state *crtc_state,
> >  		      struct skl_ddb_allocation *ddb /* out */)
> > @@ -4234,6 +4460,9 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> > *crtc_state,
> >  	u16 uv_total[I915_MAX_PLANES] = {};
> >  	u64 total_data_rate;
> >  	enum plane_id plane_id;
> > +	struct intel_plane *plane;
> > +	const struct skl_wm_level *wm_level;
> > +	const struct skl_wm_level *wm_uv_level;
> >  	int num_active;
> >  	u64 plane_data_rate[I915_MAX_PLANES] = {};
> >  	u64 uv_plane_data_rate[I915_MAX_PLANES] = {};
> > @@ -4285,12 +4514,15 @@ skl_allocate_pipe_ddb(struct
> > intel_crtc_state *crtc_state,
> >  	 */
> >  	for (level = ilk_wm_max_level(dev_priv); level >= 0; level--) {
> >  		blocks = 0;
> > -		for_each_plane_id_on_crtc(intel_crtc, plane_id) {
> > -			const struct skl_plane_wm *wm =
> > -				&crtc_state-
> > >wm.skl.optimal.planes[plane_id];
> > +		for_each_intel_plane_on_crtc(&dev_priv->drm,
> > intel_crtc, plane) {
> > +			plane_id = plane->id;
> > +			wm_level = skl_plane_wm_level(plane,
> > crtc_state,
> > +						      level, false);
> > +			wm_uv_level = skl_plane_wm_level(plane,
> > crtc_state,
> > +							 level, true);
> >  
> >  			if (plane_id == PLANE_CURSOR) {
> > -				if (WARN_ON(wm->wm[level].min_ddb_alloc 
> > >
> > +				if (WARN_ON(wm_level->min_ddb_alloc >
> >  					    total[PLANE_CURSOR])) {
> >  					blocks = U32_MAX;
> >  					break;
> > @@ -4298,8 +4530,8 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> > *crtc_state,
> >  				continue;
> >  			}
> >  
> > -			blocks += wm->wm[level].min_ddb_alloc;
> > -			blocks += wm->uv_wm[level].min_ddb_alloc;
> > +			blocks += wm_level->min_ddb_alloc;
> > +			blocks += wm_uv_level->min_ddb_alloc;
> >  		}
> >  
> >  		if (blocks <= alloc_size) {
> > @@ -4320,12 +4552,16 @@ skl_allocate_pipe_ddb(struct
> > intel_crtc_state *crtc_state,
> >  	 * watermark level, plus an extra share of the leftover blocks
> >  	 * proportional to its relative data rate.
> >  	 */
> > -	for_each_plane_id_on_crtc(intel_crtc, plane_id) {
> > -		const struct skl_plane_wm *wm =
> > -			&crtc_state->wm.skl.optimal.planes[plane_id];
> > +	for_each_intel_plane_on_crtc(&dev_priv->drm, intel_crtc, plane)
> > {
> >  		u64 rate;
> >  		u16 extra;
> >  
> > +		plane_id = plane->id;
> > +		wm_level = skl_plane_wm_level(plane, crtc_state,
> > +					      level, false);
> > +		wm_uv_level = skl_plane_wm_level(plane, crtc_state,
> > +						 level, true);
> > +
> >  		if (plane_id == PLANE_CURSOR)
> >  			continue;
> >  
> > @@ -4340,7 +4576,7 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> > *crtc_state,
> >  		extra = min_t(u16, alloc_size,
> >  			      DIV64_U64_ROUND_UP(alloc_size * rate,
> >  						 total_data_rate));
> > -		total[plane_id] = wm->wm[level].min_ddb_alloc + extra;
> > +		total[plane_id] = wm_level->min_ddb_alloc + extra;
> >  		alloc_size -= extra;
> >  		total_data_rate -= rate;
> >  
> > @@ -4351,7 +4587,7 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> > *crtc_state,
> >  		extra = min_t(u16, alloc_size,
> >  			      DIV64_U64_ROUND_UP(alloc_size * rate,
> >  						 total_data_rate));
> > -		uv_total[plane_id] = wm->uv_wm[level].min_ddb_alloc +
> > extra;
> > +		uv_total[plane_id] = wm_uv_level->min_ddb_alloc +
> > extra;
> >  		alloc_size -= extra;
> >  		total_data_rate -= rate;
> >  	}
> > @@ -4392,9 +4628,14 @@ skl_allocate_pipe_ddb(struct
> > intel_crtc_state *crtc_state,
> >  	 * that aren't actually possible.
> >  	 */
> >  	for (level++; level <= ilk_wm_max_level(dev_priv); level++) {
> > -		for_each_plane_id_on_crtc(intel_crtc, plane_id) {
> > +		for_each_intel_plane_on_crtc(&dev_priv->drm,
> > intel_crtc, plane) {
> >  			struct skl_plane_wm *wm =
> > -				&crtc_state-
> > >wm.skl.optimal.planes[plane_id];
> > +				&crtc_state-
> > >wm.skl.optimal.planes[plane->id];
> > +
> > +			wm_level = skl_plane_wm_level(plane,
> > crtc_state,
> > +						      level, false);
> > +			wm_uv_level = skl_plane_wm_level(plane,
> > crtc_state,
> > +						      level, true);
> >  
> >  			/*
> >  			 * We only disable the watermarks for each
> > plane if
> > @@ -4408,9 +4649,10 @@ skl_allocate_pipe_ddb(struct
> > intel_crtc_state *crtc_state,
> >  			 *  planes must be enabled before the level
> > will be used."
> >  			 * So this is actually safe to do.
> >  			 */
> > -			if (wm->wm[level].min_ddb_alloc >
> > total[plane_id] ||
> > -			    wm->uv_wm[level].min_ddb_alloc >
> > uv_total[plane_id])
> > -				memset(&wm->wm[level], 0, sizeof(wm-
> > >wm[level]));
> > +			if (wm_level->min_ddb_alloc > total[plane->id]
> > ||
> > +			    wm_uv_level->min_ddb_alloc >
> > uv_total[plane->id])
> > +				memset(&wm->wm[level], 0,
> > +				       sizeof(struct skl_wm_level));
> >  
> >  			/*
> >  			 * Wa_1408961008:icl, ehl
> > @@ -4418,9 +4660,14 @@ skl_allocate_pipe_ddb(struct
> > intel_crtc_state *crtc_state,
> >  			 */
> >  			if (IS_GEN(dev_priv, 11) &&
> >  			    level == 1 && wm->wm[0].plane_en) {
> > -				wm->wm[level].plane_res_b = wm-
> > >wm[0].plane_res_b;
> > -				wm->wm[level].plane_res_l = wm-
> > >wm[0].plane_res_l;
> > -				wm->wm[level].ignore_lines = wm-
> > >wm[0].ignore_lines;
> > +				wm_level = skl_plane_wm_level(plane,
> > crtc_state,
> > +							      0,
> > false);
> > +				wm->wm[level].plane_res_b =
> > +					wm_level->plane_res_b;
> > +				wm->wm[level].plane_res_l =
> > +					wm_level->plane_res_l;
> > +				wm->wm[level].ignore_lines =
> > +					wm_level->ignore_lines;
> >  			}
> >  		}
> >  	}
> > @@ -4649,12 +4896,12 @@ static bool skl_wm_has_lines(struct
> > drm_i915_private *dev_priv, int level)
> >  
> >  static void skl_compute_plane_wm(const struct intel_crtc_state
> > *crtc_state,
> >  				 int level,
> > +				 u32 latency,
> >  				 const struct skl_wm_params *wp,
> >  				 const struct skl_wm_level
> > *result_prev,
> >  				 struct skl_wm_level *result /* out */)
> >  {
> >  	struct drm_i915_private *dev_priv = to_i915(crtc_state-
> > >uapi.crtc->dev);
> > -	u32 latency = dev_priv->wm.skl_latency[level];
> >  	uint_fixed_16_16_t method1, method2;
> >  	uint_fixed_16_16_t selected_result;
> >  	u32 res_blocks, res_lines, min_ddb_alloc = 0;
> > @@ -4775,20 +5022,45 @@ static void skl_compute_plane_wm(const
> > struct intel_crtc_state *crtc_state,
> >  static void
> >  skl_compute_wm_levels(const struct intel_crtc_state *crtc_state,
> >  		      const struct skl_wm_params *wm_params,
> > -		      struct skl_wm_level *levels)
> > +		      struct skl_plane_wm *plane_wm,
> > +		      bool yuv)
> >  {
> >  	struct drm_i915_private *dev_priv = to_i915(crtc_state-
> > >uapi.crtc->dev);
> >  	int level, max_level = ilk_wm_max_level(dev_priv);
> > +	/*
> > +	 * Check which kind of plane is it and based on that calculate
> > +	 * correspondent WM levels.
> > +	 */
> > +	struct skl_wm_level *levels = yuv ? plane_wm->uv_wm : plane_wm-
> > >wm;
> >  	struct skl_wm_level *result_prev = &levels[0];
> >  
> >  	for (level = 0; level <= max_level; level++) {
> >  		struct skl_wm_level *result = &levels[level];
> > +		u32 latency = dev_priv->wm.skl_latency[level];
> >  
> > -		skl_compute_plane_wm(crtc_state, level, wm_params,
> > -				     result_prev, result);
> > +		skl_compute_plane_wm(crtc_state, level, latency,
> > +				     wm_params, result_prev, result);
> >  
> >  		result_prev = result;
> >  	}
> > +	/*
> > +	 * For Gen12 if it is an L0 we need to also
> > +	 * consider sagv_block_time when calculating
> > +	 * L0 watermark - we will need that when making
> > +	 * a decision whether enable SAGV or not.
> > +	 * For older gens we agreed to copy L0 value for
> > +	 * compatibility.
> > +	 */
> > +	if ((INTEL_GEN(dev_priv) >= 12)) {
> > +		u32 latency = dev_priv->wm.skl_latency[0];
> > +
> > +		latency += dev_priv->sagv_block_time_us;
> > +		skl_compute_plane_wm(crtc_state, 0, latency,
> > +		     wm_params, &levels[0],
> > +		    &plane_wm->sagv_wm0);
> > +	} else
> > +		memcpy(&plane_wm->sagv_wm0, &levels[0],
> > +			sizeof(struct skl_wm_level));
> >  }
> >  
> >  static u32
> > @@ -4881,7 +5153,7 @@ static int skl_build_plane_wm_single(struct
> > intel_crtc_state *crtc_state,
> >  	if (ret)
> >  		return ret;
> >  
> > -	skl_compute_wm_levels(crtc_state, &wm_params, wm->wm);
> > +	skl_compute_wm_levels(crtc_state, &wm_params, wm, false);
> >  	skl_compute_transition_wm(crtc_state, &wm_params, wm);
> >  
> >  	return 0;
> > @@ -4903,7 +5175,7 @@ static int skl_build_plane_wm_uv(struct
> > intel_crtc_state *crtc_state,
> >  	if (ret)
> >  		return ret;
> >  
> > -	skl_compute_wm_levels(crtc_state, &wm_params, wm->uv_wm);
> > +	skl_compute_wm_levels(crtc_state, &wm_params, wm, true);
> >  
> >  	return 0;
> >  }
> > @@ -5040,10 +5312,13 @@ void skl_write_plane_wm(struct intel_plane
> > *plane,
> >  		&crtc_state->wm.skl.plane_ddb_y[plane_id];
> >  	const struct skl_ddb_entry *ddb_uv =
> >  		&crtc_state->wm.skl.plane_ddb_uv[plane_id];
> > +	const struct skl_wm_level *wm_level;
> >  
> >  	for (level = 0; level <= max_level; level++) {
> > +		wm_level = skl_plane_wm_level(plane, crtc_state, level,
> > false);
> > +
> >  		skl_write_wm_level(dev_priv, PLANE_WM(pipe, plane_id,
> > level),
> > -				   &wm->wm[level]);
> > +				   wm_level);
> >  	}
> >  	skl_write_wm_level(dev_priv, PLANE_WM_TRANS(pipe, plane_id),
> >  			   &wm->trans_wm);
> > @@ -5074,10 +5349,13 @@ void skl_write_cursor_wm(struct intel_plane
> > *plane,
> >  		&crtc_state->wm.skl.optimal.planes[plane_id];
> >  	const struct skl_ddb_entry *ddb =
> >  		&crtc_state->wm.skl.plane_ddb_y[plane_id];
> > +	const struct skl_wm_level *wm_level;
> >  
> >  	for (level = 0; level <= max_level; level++) {
> > +		wm_level = skl_plane_wm_level(plane, crtc_state, level,
> > false);
> > +
> >  		skl_write_wm_level(dev_priv, CUR_WM(pipe, level),
> > -				   &wm->wm[level]);
> > +				   wm_level);
> >  	}
> >  	skl_write_wm_level(dev_priv, CUR_WM_TRANS(pipe), &wm-
> > >trans_wm);
> >  
> > @@ -5451,18 +5729,73 @@ static int
> > skl_wm_add_affected_planes(struct intel_atomic_state *state,
> >  	return 0;
> >  }
> >  
> > +static void tgl_set_sagv_mask(struct intel_atomic_state *state)
> > +{
> > +	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
> > +	struct intel_crtc *crtc;
> > +	struct intel_crtc_state *new_crtc_state;
> > +	struct intel_crtc_state *old_crtc_state;
> > +	struct skl_ddb_allocation *ddb = &state->wm_results.ddb;
> > +	int ret;
> > +	int i;
> > +	struct intel_plane *plane;
> > +
> > +	if (state->crtc_sagv_mask_set)
> > +		return;
> > +
> > +	for_each_oldnew_intel_crtc_in_state(state, crtc,
> > old_crtc_state,
> > +					    new_crtc_state, i) {
> > +		int pipe_bit = BIT(crtc->pipe);
> > +		bool skip = true;
> > +
> > +		/*
> > +		 * If we had set this mast already once for this state,
> > +		 * no need to waste CPU cycles for doing this again.
> > +		 */
> > +		for_each_intel_plane_on_crtc(&dev_priv->drm, crtc,
> > plane) {
> > +			enum plane_id plane_id = plane->id;
> > +
> > +			if (!skl_plane_wm_equals(dev_priv,
> > +				&old_crtc_state-
> > >wm.skl.optimal.planes[plane_id],
> > +				&new_crtc_state-
> > >wm.skl.optimal.planes[plane_id])) {
> > +				skip = false;
> > +				break;
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * Check if wm levels are actually the same as for
> > previous
> > +		 * state, which means we can just skip doing this long
> > check
> > +		 * and just  copy correspondent bit from previous
> > state.
> > +		 */
> > +		if (skip)
> > +			continue;
> > +
> > +		ret = tgl_check_pipe_fits_sagv_wm(new_crtc_state, ddb);
> > +		if (!ret)
> > +			state->crtc_sagv_mask |= pipe_bit;
> > +		else
> > +			state->crtc_sagv_mask &= ~pipe_bit;
> > +	}
> > +	state->crtc_sagv_mask_set = true;
> > +}
> > +
> >  static int
> >  skl_compute_wm(struct intel_atomic_state *state)
> >  {
> >  	struct intel_crtc *crtc;
> >  	struct intel_crtc_state *new_crtc_state;
> >  	struct intel_crtc_state *old_crtc_state;
> > -	struct skl_ddb_values *results = &state->wm_results;
> >  	int ret, i;
> > +	struct skl_ddb_values *results = &state->wm_results;
> > +	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
> >  
> >  	/* Clear all dirty flags */
> >  	results->dirty_pipes = 0;
> >  
> > +	/* If we exit before check is done */
> > +	state->crtc_sagv_mask = dev_priv->crtc_sagv_mask;
> > +
> >  	ret = skl_ddb_add_affected_pipes(state);
> >  	if (ret)
> >  		return ret;
> > @@ -5638,6 +5971,9 @@ void skl_pipe_wm_get_hw_state(struct
> > intel_crtc *crtc,
> >  				val = I915_READ(CUR_WM(pipe, level));
> >  
> >  			skl_wm_level_from_reg_val(val, &wm->wm[level]);
> > +			if (level == 0)
> > +				memcpy(&wm->sagv_wm0, &wm->wm[level],
> > +					sizeof(struct skl_wm_level));
> >  		}
> >  
> >  		if (plane_id != PLANE_CURSOR)
> > diff --git a/drivers/gpu/drm/i915/intel_pm.h
> > b/drivers/gpu/drm/i915/intel_pm.h
> > index b579c724b915..53275860731a 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.h
> > +++ b/drivers/gpu/drm/i915/intel_pm.h
> > @@ -43,6 +43,7 @@ void skl_pipe_wm_get_hw_state(struct intel_crtc
> > *crtc,
> >  void g4x_wm_sanitize(struct drm_i915_private *dev_priv);
> >  void vlv_wm_sanitize(struct drm_i915_private *dev_priv);
> >  bool intel_can_enable_sagv(struct intel_atomic_state *state);
> > +bool intel_has_sagv(struct drm_i915_private *dev_priv);
> >  int intel_enable_sagv(struct drm_i915_private *dev_priv);
> >  int intel_disable_sagv(struct drm_i915_private *dev_priv);
> >  bool skl_wm_level_equals(const struct skl_wm_level *l1,
> > -- 
> > 2.17.1
> > 
> 
>