[Intel-gfx] [PATCH] drm/i915: program wm blocks to at least blocks required per line
Lisovskiy, Stanislav
stanislav.lisovskiy at intel.com
Thu Apr 7 06:43:50 UTC 2022
On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote:
> On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > > In configurations with single DRAM channel, for usecases like
> > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > > memory latency calculations are probably not taking the DRAM
> > > > > > channel's impact into account.
> > > > > >
> > > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > one plane_blocks_per_line we should have selected method2.
> > > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > > per line.
> > > > > >
> > > > > > cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
> > > > > >
> > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai at intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > > > 1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > - blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > + /*
> > > > > > + * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > > + * as there will be at minimum one line for lines configuration.
> > > > > > + *
> > > > > > + * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > + * one plane_blocks_per_line, we should have selected method2 in
> > > > > > + * the above logic. Assuming that modern versions have enough dbuf
> > > > > > + * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > > + * select the blocks as plane_blocks_per_line.
> > > > > > + *
> > > > > > + * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > > + * channels' impact on the level 0 memory latency and the relevant
> > > > > > + * wm calculations.
> > > > > > + */
> > > > > > + blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > > + max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > > + fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > > + fixed16_to_u32_round_up(selected_result) + 1;
> > > > >
> > > > > That's looks rather convoluted.
> > > > >
> > > > > blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > + /* blah */
> > > > > + if (has_lines)
> > > > > + blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > >
> > > > We probably need to do similar refactoring in the whole function ;-)
> > > >
> > > > >
> > > > > Also since Art said nothing like this should actually be needed
> > > > > I think the comment should make it a bit more clear that this
> > > > > is just a hack to work around the underruns with some single
> > > > > memory channel configurations.
> > > >
> > > > It is actually not quite a hack, because we are missing that condition
> > > > implementation from BSpec 49325, which instructs us to select method2
> > > > when ddb blocks allocation is known and that ratio is >= 1.
> > >
> > > The ddb allocation is not yet known, so we're implementing the
> > > algorithm 100% correctly.
> > >
> > > And this patch does not implement that misisng part anyway.
> >
> > Yes, as I understood method2 would just give amount of blocks to be
> > at least as dbuf blocks per line.
> >
> > Wonder whether should we actually fully implement this BSpec clause
> > and add it to the point where ddb allocation is known or are there
> > any obstacles to do that, besides having to reshuffle this function a bit?
>
> We need to calculate the wm to figure out how much ddb to allocate,
> and then we'd need the ddb allocation to figure out how to calculate
> the wm. Very much chicken vs. egg right there. We'd have to do some
> kind of hideous loop where we'd calculate everything twice. I don't
> really want to do that since I'd actually like to move the wm
> calculation to happen already much earlier during .check_plane()
> as that could reduce the amount of redundant wm calculations we
> are currently doing.
I might be missing some details right now, but why do we need a ddb
allocation to count wms?
I thought its like we usually calculate wm levels + min_ddb_allocation,
then based on that we do allocate min_ddb + extra for each plane.
This is correct that by this moment when we calculate wms we have only
min_ddb available, so if this level would be even enabled, we would
at least need min_ddb blocks.
I think we could just use that min_ddb value here for that purpose,
because the condition anyway checks if
(plane buffer allocation / plane blocks per line) >=1 so, even if
if this wm level would be enabled plane buffer allocation would
be at least min_ddb _or higher_ - however that won't affect this
condition because even if it happens to be "plane buffer allocation
+ some extra" the ratio would still be valid.
So if it executes for min_ddb / plane blocks per line, we can
probably safely state, further it will be also true.
Stan
>
> --
> Ville Syrjälä
> Intel
More information about the Intel-gfx
mailing list