[Intel-gfx] [PATCH v2] drm/i915: Enable second dbuf slice for ICL

Thu Oct 17 08:05:39 UTC 2019

On Wed, 2019-10-16 at 16:41 -0700, Matt Roper wrote:
> On Wed, Oct 09, 2019 at 10:39:08AM +0300, Stanislav Lisovskiy wrote:
> > Also implemented algorithm for choosing DBuf slice configuration
> > based on active pipes, pipe ratio as stated in BSpec 12716.
> > 
> > Now pipe allocation still stays proportional to pipe width as
> > before,
> > however within allowed DBuf slice for this particular
> > configuration.
> > 
> > v2: Remove unneeded check from commit as ddb enabled slices might
> >     now differ from hw state.
> 
> I can't seem to find v1 of this patch in my archives; can you
> elaborate
> on this? 

Hi, thanks for good points in review. I initially submitted v1 only for
TryBot, then figured out some issue and then sent a patch to that
mailing list.

>  It looks like a bit of a mismatch in terms of how we're
> interpreting 'ddb.enabled_slices' in different parts of the driver.
> During hw readout we're treating the value as the number of powered
> up
> slices (which will always be 2 since we always power them both up in
> icl_display_core_init -> icl_dbuf_enable even when we're not going to
> use both), whereas in the atomic check code we're interpreting the
> value
> as the number of slices we want to distribute our plane allocations
> over.  We may want to break these out into two separate fields
> (powered
> slices vs utilized slices) that we track separately.

Had exactly same concern, I wanted to always treat this field as number
of slices we have available, however if we are going to enable/disable 
that dynamically probably the only way is to split it.

> 
> > 
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c |   6 -
> >  drivers/gpu/drm/i915/intel_pm.c              | 208
> > ++++++++++++++++++-
> >  2 files changed, 202 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> > b/drivers/gpu/drm/i915/display/intel_display.c
> > index 1a533ccdb54f..4683731ac1ca 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -12989,12 +12989,6 @@ static void verify_wm_state(struct
> > intel_crtc *crtc,
> >  	skl_ddb_get_hw_state(dev_priv, &hw->ddb);
> >  	sw_ddb = &dev_priv->wm.skl_hw.ddb;
> >  
> > -	if (INTEL_GEN(dev_priv) >= 11 &&
> > -	    hw->ddb.enabled_slices != sw_ddb->enabled_slices)
> > -		DRM_ERROR("mismatch in DBUF Slices (expected %u, got
> > %u)\n",
> > -			  sw_ddb->enabled_slices,
> > -			  hw->ddb.enabled_slices);
> > -
> 
> Related to the comment above, we probably do want to make sure that
> the
> number of powered up dbuf slices matches what we expect (which today
> is
> always 2, but that might change in the future if we try to be more
> intelligent and power down the second slice when it isn't needed)
> 
> If we're already confirming that the planes' hw/sw DDB allocations
> themselves match farther down this function, then we're effectively
> already checking the distribution of planes between the two slices.
> 
> >  	/* planes */
> >  	for_each_universal_plane(dev_priv, pipe, plane) {
> >  		struct skl_plane_wm *hw_plane_wm, *sw_plane_wm;
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > b/drivers/gpu/drm/i915/intel_pm.c
> > index bfcf03ab5245..0fbeea61299f 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3616,7 +3616,7 @@ static u8
> > intel_enabled_dbuf_slices_num(struct drm_i915_private *dev_priv)
> >  	 * only that 1 slice enabled until we have a proper way for on-
> > demand
> >  	 * toggling of the second slice.
> >  	 */
> 
> 
> This comment is/was also outdated.  Also note that this change is
> going
> to impact TGL as well even though you haven't added the TGL handling
> for
> slice assignment yet.
> 
> > -	if (0 && (num_active > 1 || total_data_bw >= GBps(12))) {
> > +	if ((num_active > 1 || total_data_bw >= GBps(12))) {
> 
> Where does the 12 GBps number come from?  I know the comment above
> this
> says that's the maximum BW supported by a single DBuf slice, but I
> can't
> find where this is mentioned in the bspec (and I'm not sure if that
> would apply to all gen11+ platforms or whether that was an ICL-
> specific
> number from when the comment/code was first written).

Had same question actually - neither me or Ville found any mentioning
about 12 GB/s in bspec, so decided to leave it as is before at least
we figure out where it came from.

> 
> >  		ddb->enabled_slices = 2;
> >  	} else {
> >  		ddb->enabled_slices = 1;
> > @@ -3831,6 +3831,35 @@ static u16 intel_get_ddb_size(struct
> > drm_i915_private *dev_priv,
> >  	return ddb_size;
> >  }
> >  
> > +/*
> > + * Calculate initial DBuf slice offset, based on slice size
> > + * and mask(i.e if slice size is 1024 and second slice is enabled
> > + * offset would be 1024)
> > + */
> > +static u32 skl_get_first_dbuf_slice_offset(u32 dbuf_slice_mask,
> > +					   u32 slice_size, u32
> > ddb_size)
> > +{
> > +	u32 offset = 0;
> > +
> > +	if (!dbuf_slice_mask)
> > +		return 0;
> > +
> > +	while (!(dbuf_slice_mask & 1)) {
> > +		dbuf_slice_mask >>= 1;
> > +		offset += slice_size;
> > +		if (offset >= ddb_size)
> 
> 
> We can probably move this call down into the function below.  AFAICS,
> pipe ratio doesn't matter anymore once we get to TGL and doesn't
> matter
> on earlier platforms that only have one slice, so there's no need
> to calculate it if we aren't going to use it.

Good point, agree it is probably not so common to be used here.
Also thanks for spotting that I forgot to invert pipe_ratio value.

> 
> > +
> > +	/*
> > +	 * Get allowed DBuf slices for correspondent pipe and platform.
> > +	 */
> > +	dbuf_slice_mask = i915_get_allowed_dbuf_mask(dev_priv,
> > for_pipe,
> > +						     active_pipes,
> > pipe_ratio);
> > +
> > +	DRM_DEBUG_KMS("DBuf slice mask %x pipe %d active pipes %x\n",
> > +		      dbuf_slice_mask,
> > +		      for_pipe, active_pipes);
> > +
> > +	/*
> > +	 * Figure out at which DBuf slice we start, i.e if we start at
> > Dbuf S2
> > +	 * and slice size is 1024, the offset would be 1024
> > +	 */
> > +	offset = skl_get_first_dbuf_slice_offset(dbuf_slice_mask,
> > +						 slice_size, ddb_size);
> > +
> > +	/*
> > +	 * Figure out total size of allowed DBuf slices, which is
> > basically
> > +	 * a number of allowed slices for that pipe multiplied by slice
> > size.
> > +	 * We might still have some dbuf slices disabled in case if
> > those
> > +	 * are not needed based on bandwidth requirements and
> > num_active pipes,
> > +	 * so stick to real ddb size if it happens to be less. Inside
> > of this
> > +	 * range ddb entries are still allocated in proportion to
> > display width.
> > +	 */
> > +	ddb_range_size = min(hweight8(dbuf_slice_mask) * slice_size,
> > +			     (unsigned int)ddb_size);
> > +
> >  	/*
> >  	 * Watermark/ddb requirement highly depends upon width of the
> >  	 * framebuffer, So instead of allocating DDB equally among
> > pipes
> > @@ -3890,10 +3967,23 @@ skl_ddb_get_pipe_allocation_limits(struct
> > drm_i915_private *dev_priv,
> >  			&crtc_state->base.adjusted_mode;
> >  		enum pipe pipe = crtc->pipe;
> >  		int hdisplay, vdisplay;
> > +		uint_fixed_16_16_t ratio =
> > skl_get_pipe_ratio(crtc_state);
> > +		u32 pipe_dbuf_slice_mask =
> > i915_get_allowed_dbuf_mask(dev_priv,
> > +								      p
> > ipe,
> > +								      a
> > ctive_pipes,
> > +								      r
> > atio);
> 
> Minor nitpick, but the lines here are a bit over 80 characters.  You
> may
> want to break the line after the equals sign.
> 
> >  
> >  		if (!crtc_state->base.enable)
> >  			continue;
> >  
> > +		/*
> > +		 * According to BSpec pipe can share one dbuf slice
> > with another pipes or pipe can use
> > +		 * multiple dbufs, in both cases we account for other
> > pipes only if they have
> > +		 * exactly same mask.
> > +		 */
> 
> Some more long lines that need to be wrapped.
> 
> > +		if (dbuf_slice_mask != pipe_dbuf_slice_mask)
> > +			continue;
> > +
> >  		drm_mode_get_hv_timing(adjusted_mode, &hdisplay,
> > &vdisplay);
> >  		total_width += hdisplay;
> >  
> > @@ -3903,8 +3993,11 @@ skl_ddb_get_pipe_allocation_limits(struct
> > drm_i915_private *dev_priv,
> >  			pipe_width = hdisplay;
> >  	}
> >  
> > -	alloc->start = ddb_size * width_before_pipe / total_width;
> > -	alloc->end = ddb_size * (width_before_pipe + pipe_width) /
> > total_width;
> > +	alloc->start = offset + ddb_range_size * width_before_pipe /
> > total_width;
> > +	alloc->end = offset + ddb_range_size * (width_before_pipe +
> > pipe_width) / total_width;
> > +
> > +	DRM_DEBUG_KMS("Pipe %d ddb %d-%d\n", for_pipe,
> > +		      alloc->start, alloc->end);
> >  }
> >  
> >  static int skl_compute_wm_params(const struct intel_crtc_state
> > *crtc_state,
> > @@ -4255,6 +4348,109 @@ skl_get_total_relative_data_rate(struct
> > intel_crtc_state *crtc_state,
> >  	return total_data_rate;
> >  }
> >  
> > +uint_fixed_16_16_t
> > +skl_get_pipe_ratio(const struct intel_crtc_state *crtc_state)
> 
> I think this should be named icl_get_pipe_ratio() given that it comes
> from a gen11-specific page of the bspec?
> 
> > +{
> > +	struct drm_plane *plane;
> > +	const struct drm_plane_state *drm_plane_state;
> > +	uint_fixed_16_16_t pipe_downscale;
> > +	uint_fixed_16_16_t max_downscale = u32_to_fixed16(1);
> > +
> > +	if (!crtc_state->base.enable)
> 
> This function only gets called from
> skl_ddb_get_pipe_allocation_limits()
> which tests crtc_state->base.active at the beginning and bails out if
> the CRTC isn't active.  active && !enable shouldn't be possible, so
> I'd
> add a WARN_ON() here to make that assertion clear.
> 
> > +		return max_downscale;
> > +
> > +	drm_atomic_crtc_state_for_each_plane_state(plane,
> > drm_plane_state, &crtc_state->base) {
> > +		uint_fixed_16_16_t plane_downscale;
> > +		const struct intel_plane_state *plane_state =
> > +			to_intel_plane_state(drm_plane_state);
> > +
> > +		if (!intel_wm_plane_visible(crtc_state, plane_state))
> > +			continue;
> > +
> > +		plane_downscale =
> > skl_plane_downscale_amount(crtc_state, plane_state);
> > +
> > +		max_downscale = max_fixed16(plane_downscale,
> > max_downscale);
> > +	}
> > +
> > +	pipe_downscale = skl_pipe_downscale_amount(crtc_state);
> > +
> > +	pipe_downscale = mul_fixed16(pipe_downscale, max_downscale);
> > +
> > +	return pipe_downscale;
> 
> Wouldn't the pipe ratio be 1/pipe_downscale?
> 
> 
> > +}
> > +
> > +#define DBUF_S1_BIT 1
> > +#define DBUF_S2_BIT 2
> > +
> > +struct dbuf_slice_conf_entry {
> > +	u32 dbuf_mask[I915_MAX_PIPES];
> > +	u32 active_pipes;
> > +};
> > +
> > +
> > +/*
> > + * Table taken from Bspec 12716
> > + * Pipes do have some preferred DBuf slice affinity,
> > + * plus there are some hardcoded requirements on how
> > + * those should be distributed for multipipe scenarios.
> > + * For more DBuf slices algorithm can get even more messy
> > + * and less readable, so decided to use a table almost
> > + * as is from BSpec itself - that way it is at least easier
> > + * to compare, change and check.
> > + */
> > +struct dbuf_slice_conf_entry icl_allowed_dbufs[] = {
> > +{ { 0, 0, 0, 0 }, 0 },
> > +{ { DBUF_S1_BIT | DBUF_S2_BIT, 0, 0, 0 }, BIT(PIPE_A) },
> > +{ { DBUF_S1_BIT, 0, 0, 0 }, BIT(PIPE_A) },
> > +{ { 0, DBUF_S1_BIT | DBUF_S2_BIT, 0, 0 }, BIT(PIPE_B) },
> > +{ { 0, DBUF_S1_BIT, 0, 0 }, BIT(PIPE_B) },
> > +{ { 0, 0, DBUF_S1_BIT | DBUF_S2_BIT }, BIT(PIPE_C) },
> > +{ { 0, 0, DBUF_S2_BIT, 0 }, BIT(PIPE_C) },
> > +{ { DBUF_S1_BIT, DBUF_S2_BIT, 0, 0 }, BIT(PIPE_A) | BIT(PIPE_B) },
> > +{ { DBUF_S1_BIT, 0, DBUF_S2_BIT, 0 }, BIT(PIPE_A) | BIT(PIPE_C) },
> > +{ { 0, DBUF_S1_BIT, DBUF_S2_BIT, 0 }, BIT(PIPE_B) | BIT(PIPE_C) },
> > +{ { DBUF_S1_BIT, DBUF_S1_BIT, DBUF_S2_BIT, 0 }, BIT(PIPE_A) |
> > BIT(PIPE_B) | BIT(PIPE_C) },
> 
> You might want to align some of the columns here to make it slightly
> easier to read.  And even though the bspec puts the pipes in the
> final
> column, I think it would be more natural for readability to move that
> first and/or put it on a line by itself.  You've already got a line
> here
> that exceeds 80 characters by a bit and once you add a TGL table with
> four pipes you're going to have even longer lines.
> 
> 
> > +};
> > +
> > +/*
> > + * This function finds an entry with same enabled pipe
> > configuration and
> > + * returns correspondent DBuf slice mask as stated in BSpec for
> > particular
> > + * platform.
> > + */
> > +static u32 icl_get_allowed_dbuf_mask(int pipe,
> > +				     u32 active_pipes,
> > +				     uint_fixed_16_16_t pipe_ratio)
> > +{
> > +	int i;
> > +	for (i = 0; i < ARRAY_SIZE(icl_allowed_dbufs); i++) {
> > +		if (icl_allowed_dbufs[i].active_pipes == active_pipes)
> > {
> > +			/*
> > +			 * According to BSpec 12716: if pipe_ratio >=
> > 88.8
> > +			 * use single pipe, even if two are accessible.
> > +			 */
> > +			if (pipe_ratio.val >= div_fixed16(888, 10).val)
> > +				++i;
> > +			return icl_allowed_dbufs[i].dbuf_mask[pipe];
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> > +u32 i915_get_allowed_dbuf_mask(struct drm_i915_private *dev_priv,
> > +				      int pipe, u32 active_pipes,
> > +				      uint_fixed_16_16_t pipe_ratio)
> > +{
> > +	if (IS_GEN(dev_priv, 11))
> > +		return icl_get_allowed_dbuf_mask(pipe,
> > +						 active_pipes,
> > +						 pipe_ratio);
> > +	/*
> > +	 * For anything else just return one slice yet.
> > +	 * Should be extended for other platforms.
> > +	 */
> 
> Note that you've already adjusted the DDB size for TGL in
> intel_get_ddb_size(), so we probably want to add TGL's table at the
> same
> time as the gen11 table.
> 
> 
> Matt
> 
> > +	return DBUF_S1_BIT;
> > +}
> > +
> >  static u64
> >  icl_get_total_relative_data_rate(struct intel_crtc_state
> > *crtc_state,
> >  				 u64 *plane_data_rate)
> > -- 
> > 2.17.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
>