[Intel-gfx] [PATCH v7] drm/i915: Enable second dbuf slice for ICL and TGL

Wed Nov 20 21:55:58 UTC 2019

On Mon, Nov 20, 2019 at 09:18:48PM +0000, Ville Syrjälä wrote:

> >
> > I'm not reenabling it - this is code is out there right now.
>
> > It's _dead_ code, and it's like that on purpose. You're now attempting
> > to re-enable it without fixing the underlying issues.
>
> Sounds like some sacred thing. Dead code which is not commented, neither
> removed, but which you can't touch.

>You can touch it by either removing it or fixing it. Anything else
>is a bit pointless.

Ok, then instead of pointless arguing, just need idea where to put it.

Like there is a big difference between "this code is wrong.point." and 
and properly explained suggestion where/why it should be put.
Also yep to me any code which is not actually working properly or
should not be used, must be removed or commented out.

I don't really get this "dead" code concept. Sounds cool though.

>It's a global resource shared by the pipes. Hence it's quite similar
>to SAGV and cdclk.

>> So all this was simply about that I should put this function call into
>> skl_commit_modeset_disables?

>It should be in intel_atomic_commit_tail() most likely, just like cdclk.

Ok. 

>I cooked up a small test that I believe should end up ping-ponging
>between one and two slices on icl, and so should end up exploding
>when the second slice gets shut down prematurely:
>git://github.com/vsyrjala/intel-gpu-tools.git dbuf_slice_test

 Yep, because currently no tests explode on this. 

>>  Some one might come up that it is called "num_slices" or
>>  "dbuf_num_slices". Seriously, may be "supported_num_dbuf_slices" to
>>  keep everyones bloated egos happy??

Since we have ddb_size in device info already it should be next to that
actually. And we should probably unify the terminology a bit so we don't
mix dbuf and ddb in the same sentence.

May be still use ddb allocation struct to group things together, similar
how it is done for bw info in dev_priv. This will make it easier at least
in a sense that we won't have to constantly prefix this with dbuf/ddb.

Not sure if this needs to be done all in one as this feels more like
some cleanup/prep stuff, which is evergoing anyway.

I will anyway rename the var, as we wasted already way too much 
time on this..

>It messes up the software state that we computed with something that
>we didn't compute. So going forward the state will be inconsistent.

>The core concept of atomic is that the state is precomputed and
>validated ahead of time. The commit phase is supposed to be dumb and
>isn't allowed to modify the state. This is also why we often pass
>the states as const during the commit phase, though unfortunately
>we can't do that always on account of the states being slightly
>overloaded with other metadata.

Sounds more philosophical question.  You either have inconsistent
with itself software state or the software which is consistent with
itself but not with the hardware.
My idea was always to reflect hardware state as much as possible,
however I get your point here.

>> >Looking at something else unfortunately. Also it has just kept
>> >bitrotting since because no one has cared.
>>
>> That is what exactly surprises me. While we have issues
>> like commit serialization, sagv, mst, half of display buffer only
>> enabled, bw, and we are arguing about how supported_slices
>> should be named. Or that I shouldn't refer to DBuf slices as bits
>> but define as values and then decode those as bits. Priorities.

>Usually people complain about reviewers not pointing out
>all the issues in one sitting.

It was not about complaining but more about prioritization.
While trying to reach as perfect code/variable naming/placement is great,
this process is evergoing, while some things might be more important.

I don't believe DBUF_S1_BIT defined as BIT(0) or DBUF_S1 defined as 0
and then used anyway used as BIT(0) will bring any difference.

We are just arguing whether A + B or B + A is better here. 

There are lots of assumptions, "dead" code and other stuff already.

And then there is a real important stuff like commit serialization problem,
the actual features implementation which have to wait, because of that.

>
> > > > >
> > > > > >       for_each_new_intel_crtc_in_state(intel_state, crtc,
> > > > > > crtc_state,
> > > > > > i) {
> > > > > >               const struct drm_display_mode *adjusted_mode =
> > > > > >                       &crtc_state->hw.adjusted_mode;
> > > > > >               enum pipe pipe = crtc->pipe;
> > > > > >               int hdisplay, vdisplay;
> > > > > > +             u32 pipe_dbuf_slice_mask = \
> > > > > > +                     i915_get_allowed_dbuf_mask(dev_priv,
> > > > > > +                                             pipe,
> > > > > > +                                             active_pipes,
> > > > > > +                                             crtc_state);
> > > > > >
> > > > > >               if (!crtc_state->hw.enable)
> > > > > >                       continue;
> > > > > >
> > > > > > +             /*
> > > > > > +              * According to BSpec pipe can share one dbuf
> > > > > > slice
> > > > > > with another
> > > > > > +              * pipes or pipe can use multiple dbufs, in
> > > > > > both cases
> > > > > > we
> > > > > > +              * account for other pipes only if they have
> > > > > > exactly
> > > > > > same mask.
> > > > > > +              * However we need to account how many slices
> > > > > > we should
> > > > > > enable
> > > > > > +              * in total.
> > > > > > +              */
> > > > > > +             total_slice_mask |= pipe_dbuf_slice_mask;
> > > > >
> > > > > total_slice_mask will now account only the crtcs in the state.
> > > > > What
> > > > > happens to the other pipes' slices?
> >
>> > Same as happens now btw. Right now it also does the same thing.
>>
>> >Right now it does absolutely nothing. It's all dead code.
>>
>> Right now, this "dead" code exactly iterates through crtcs in state and
>> allocates DBuf S1 in proportion to pipe width.
>> What I do is make it do the same thing but using either range of one
>> or two slices, depending on pipe configuration.

>OK. So how does it deal with my original question?

Original question was about that there are might be not all crtcs in the state.
I told that I know as this is existing problem and actually not brought up
by my code even. 
Currently it works that way. Just as many other things related to commit serialization
thing. But should I really fix all this in my patch, which was just supposed to 
bring proper usage of the DBuf slices?
I think this should be fixed separately with some patch, saying "start using global
state instead of current atomic state when allocating DBuf for pipes".
Because I don't want to debug issues coming from enabling second DBuf slice
and those from that issue at the same time.

>>
>> So should I have thrown away that code and rewritten this as well?

>Not if it works.

So it works or not? Or this is a "dead" code as you mentioned, 
I'm confused now.

>The cdclk table has one redundancy that I'm not 100% fan of:
>it specifies both the cdclk frequency and the cd2x divider.
>We could make do with just either one and calculate the other
>one.

>On the flip side that redundancy would allow us to cross check
>that the values produce the expected results. I actually have
>a patch for that buried in some branch.

Exactly easier cross check.

>Apart from that I don't see what the cdclk table has to do
>with this stuff. We can't derive that cdclk table from anything
>much simpler

See above. Even if you could that would eliminate table but
reduce readability.

>Anyways, my preferred approach to this sort of stuff is to
>make the computer do the calculations. That's what it's good
>at. Humans, not so great at computations, or even copy pasting.
>Though as with the cdclk table it may be possible to make
>humans do the calculations/copy paste and then have the
>computer double check it all makes sense.

Calculations are great for something complex. You will not use
calculator every time you need to add 2+2. 
The table you proposing will require some kind of minimal spanning
tree algorithm.

I.e there are metrics about how far is each DBuf slice from
each Pipe. You build a graph from that and try to get a config
which corresponds to minimal spanning tree of this graph - 
that would be optimal DBuf assignment.

However this would have to be done in runtime, doing calculations
and lots of compares. 
           S1    S2
Pipe1   1      2
Pipe2    2     1

You basically then have 1+2, 2+1,2+2,1+1 where minimum is of course 1+1, 
i.e Pipe1(S1) and Pipe2(S2)
Even for this amount of pipes you have 4 combinations to check.
You can of course forbid to use same slice, thus leaving only 2 of those,
but that will violate BSpec as it allows to use same slice.
And how about 3,4 pipes? 

Sound almost like implementing a hash function for doing search
in array of 10 elements, just as we discussed at the meeting.
Fancy, interesting, but not redundant.

So basically we are loosing readability of the code and also add
more complexity to the code(means more possible errors).
What for?

While we can just have a compile time defined mask. In a small lookup table,
which is easy to read and compare. And no run-time algorithms.
No redundant complexity.

I admit my table could be not the best way to do it. OK.
But it makes sense and works. It is always possible to improve.

Stan

Best Regards,

Lisovskiy Stanislav

Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

________________________________________
From: Ville Syrjälä [ville.syrjala at linux.intel.com]
Sent: Wednesday, November 20, 2019 9:18 PM
To: Lisovskiy, Stanislav
Cc: Saarinen, Jani; intel-gfx at lists.freedesktop.org; Roper, Matthew D; maarten.lankhorst at linux.intel.com; Ausmus, James
Subject: Re: [PATCH v7] drm/i915: Enable second dbuf slice for ICL and TGL

On Mon, Nov 18, 2019 at 08:25:48PM +0000, Lisovskiy, Stanislav wrote:
> Hi,
>
> > > > > >
> > > > > > -     /* If 2nd DBuf slice is no more required disable it */
> > > > > > -     if (INTEL_GEN(dev_priv) >= 11 && required_slices <
> > > > > > hw_enabled_slices)
> > > > > > -             icl_dbuf_slices_update(dev_priv,
> > > > > > required_slices);
> > > > > > +     /*
> > > > > > +      * If some other DBuf slice is not needed, disable it
> > > > > > here,
> > > > > > +      * as here we only remove slices, which are not needed
> > > > > > anymore.
> > > > > > +      */
> > > > > > +     if (INTEL_GEN(dev_priv) >= 11 && required_slices !=
> > > > > > hw_enabled_slices)
> > > > > > +             icl_dbuf_slices_update(dev_priv,
> > > > > > slices_intersection);
> > > > >
> > > > > This is still a totally wrong place for this. We need to be sure
> > > > > the
> > > > > planes have switched over to the new ddb layout before we power
> > > > > off
> > > > > anything.
> > > >
> > > > I have just modified already existing line here, not added by me.
> > > > The point of this patch is starting to use second DBuf slice but
> > > > not fixing all kinds of issues related to DBuf code at the same
> > > > time.
> > >
> > > You're re-enabling code I disabled on purpose. It has to be fixed
> > > properly or not touched at all. In hindsight I should have probably
> > > just ripped it all out to not give people bad ideas.
> >
> > -     /* If 2nd DBuf slice is no more required disable it */
> > > > > > -     if (INTEL_GEN(dev_priv) >= 11 && required_slices <
> > > > > > hw_enabled_slices)
> > > > > > -             icl_dbuf_slices_update(dev_priv,
> > > > > > required_slices);
> >
> > I'm not reenabling it - this is code is out there right now.
>
> > It's _dead_ code, and it's like that on purpose. You're now attempting
> > to re-enable it without fixing the underlying issues.
>
> Sounds like some sacred thing. Dead code which is not commented, neither
> removed, but which you can't touch.

You can touch it by either removing it or fixing it. Anything else
is a bit pointless.

>
> > This is all again about changing all kinds of stuff at the same time.
> >
> > Why we can't just do things step by step? You have a point that
> > this is wrong place ok, but as we see current code causes no
> > regressions. Just tell me your idea where to put it and I will
> > do that, if you don't have it yet - let it stay there, if CI shows
> > no regressions currently.
>
> >The same global state tricks apply here as with the cdclk/sagv stuff.
> >1. enable new resources/bump up clocks
> >2. reconfigure planes/etc.
> >3. wait for vblanks
> >4. disable unused resources/drop the clocks
>
> It is different with SAGV, there we restrict whats needed first and relax
> some limitations afterwards.

It's a global resource shared by the pipes. Hence it's quite similar
to SAGV and cdclk.

> So all this was simply about that I should put this function call into
> skl_commit_modeset_disables?

It should be in intel_atomic_commit_tail() most likely, just like cdclk.

I cooked up a small test that I believe should end up ping-ponging
between one and two slices on icl, and so should end up exploding
when the second slice gets shut down prematurely:
git://github.com/vsyrjala/intel-gpu-tools.git dbuf_slice_test

> I could just have said it straight away to me, without mentioned "dead" code
> and other stuff..

I don't read minds either.

>
> > > > > > +     return INTEL_INFO(dev_priv)->supported_slices;
> > > > > > +}
> > > > >
> > > > > Kinda pointless function now. 'supported_slices' is not a very
> > > > > good
> > > > > name. Should be 'num_dbuf_slices' or something along those lines.
> > > >
> > > > Supported slices really means how much slices we support in
> > > > hardware.
> > > > num_dbuf_slices will not reflect that, it will be more ambigious.
> > > > I don't like this name.
> > >
> > > supported_slices doesn't even say "dbuf" anywhere. So it's even more
> > > ambiguous. Also we already have num_pipes, num_sprites, etc.
> >
>  ..and in current code we have... "enabled_slices", no? Ouch! :) IS this
>  really that important and worth arguing at the very last moment?
>  whether number of
>  supported_slices_in_hardware, should be named "supported slices" or
>  "num_dbuf_slices"? If yes, lets go on, miss some targets and continue
>  discussing this "soooo important" thing.
>
>  Some one might come up that it is called "num_slices" or
>  "dbuf_num_slices". Seriously, may be "supported_num_dbuf_slices" to
>  keep everyones bloated egos happy??

Since we have ddb_size in device info already it should be next to that
actually. And we should probably unify the terminology a bit so we don't
mix dbuf and ddb in the same sentence.

>
> > > > > >  static void icl_dbuf_enable(struct drm_i915_private *dev_priv)
> > > > > >  {
> > > > > > -     I915_WRITE(DBUF_CTL_S1, I915_READ(DBUF_CTL_S1) |
> > > > > > DBUF_POWER_REQUEST);
> > > > > > -     I915_WRITE(DBUF_CTL_S2, I915_READ(DBUF_CTL_S2) |
> > > > > > DBUF_POWER_REQUEST);
> > > > > > -     POSTING_READ(DBUF_CTL_S2);
> > > > > > -
> > > > > > -     udelay(10);
> > > > > > -
> > > > > > -     if (!(I915_READ(DBUF_CTL_S1) & DBUF_POWER_STATE) ||
> > > > > > -         !(I915_READ(DBUF_CTL_S2) & DBUF_POWER_STATE))
> > > > > > -             DRM_ERROR("DBuf power enable timeout\n");
> > > > > > -     else
> > > > > > -             /*
> > > > > > -              * FIXME: for now pretend that we only have 1
> > > > > > slice,
> > > > > > see
> > > > > > -              * intel_enabled_dbuf_slices_num().
> > > > > > -              */
> > > > > > -             dev_priv->wm.skl_hw.ddb.enabled_slices = 1;
> > > > > > +     /*
> > > > > > +      * Just power up 1 slice, we will
> > > > > > +      * figure out later which slices we have and what we
> > > > > > need.
> > > > > > +      */
> > > > > > +     dev_priv->wm.skl_hw.ddb.enabled_slices = DBUF_S1_BIT;
> > > > >
> > > > > That stuff really shouldn't be modified by low level functions.
> > > > > The intial value should come from readout, and after that it
> > > > > should
> > > > > be adjusted as part of the overall atomic state.
> > > >
> > > > Well, as you can see currently this is even worse - has some weird
> > > > hardcoded stuff and is modified like that. Also if we are enabling
> > > > the
> > > > dbuf slices, shouldn't we really determine what value should be
> > > > there?
> > > >
> > > > Also I wouldn't really always trust the readout, as to me it seems
> > > > icelake resets it constantly to 2 after suspend/resume.
> > >
> > > The point of readout is that we syncrhonize our software state with
> > > the
> > > current state of hardware without having to modify the hardware
> > > state.
> > > Once synchronized the driver takes over and tells the hw what to do.
> >
> > That is especially nice after you say "report error and don't care"
> > when I really want to reflect hw state in sw state, by checking whether
> > we could set dbuf slices or not.
> > Now you say that we need to synchronize software state with hw state,
> > however if I fail to modify it - then "who cares".
>
> >Yes. If you fail to modify the hardware state the whole thing is
> >broken anyway. There's is *nothing* we can do at this stage in the
> >commit to fix it.
>
> Ok, but why we can't at least keep the last successful state in sw
> instead of something which is completely wrong. How does that harm?

It messes up the software state that we computed with something that
we didn't compute. So going forward the state will be inconsistent.

The core concept of atomic is that the state is precomputed and
validated ahead of time. The commit phase is supposed to be dumb and
isn't allowed to modify the state. This is also why we often pass
the states as const during the commit phase, though unfortunately
we can't do that always on account of the states being slightly
overloaded with other metadata.

>
> > > > > >
> > > > > >  struct skl_ddb_allocation {
> > > > >
> > > > > This struct is quite pointless. Should be killed off.
> > > >
> > > > Legacy stuff, not added by me. Quite fine with removing it -
> > > > but should we really do all this at the same time and in same
> > > > patch?
> > >
> > > Not the same patch. But clean up the cobwebs first, then add the new
> > > stuff. Much easier that way around.
> >
> > There is a shit ton load of cobwebs to clean. Personally my opinion is
> > that current dbuf controlling code is total garbage - which has lots of
> > hardcoded stuff. Good luck enabling only second dbuf slice, good luck
> > disabling/enabling dbuf properly....
> > Where you were, when this was committed then?
>
> >Looking at something else unfortunately. Also it has just kept
> >bitrotting since because no one has cared.
>
> That is what exactly surprises me. While we have issues
> like commit serialization, sagv, mst, half of display buffer only
> enabled, bw, and we are arguing about how supported_slices
> should be named. Or that I shouldn't refer to DBuf slices as bits
> but define as values and then decode those as bits. Priorities.

Usually people complain about reviewers not pointing out
all the issues in one sitting.

>
> >
> > If we need to fix it - fine. But may be we should do it in stages?
> > Point of this patch is "enable second dbuf slice", next patch "remove
> > unneeded structs" and so on.
> > While you want me to shuffle everything at the same time, which is not
> > good idea, especially considering complexity and amount of legacy stuff
> > we currently have.
>
> >Building new stuff on top of a decomposing garbage pile is not
> >a good idea: a) it's harder than building on clean foundation,
> >b) it'll probably fall over and get buried in the old garbage
> >at which point you're likely going to be right back where you
> >started
> .
> Sure, let's change everything at the same time and then wonder
> what had failed exactly in that huge series with lots of unrelated changes.
> Standard practice.

Series can be split up to smaller chunks. My usual MO is to split
prep stuff out to separate patches, and if it looks like the overall
series is getting too big I even post the prep patches separately.
In fact I often end up with more than one prep series for whatever
actual goal I was trying to reach.

>
> > > > >
> > > > > >       for_each_new_intel_crtc_in_state(intel_state, crtc,
> > > > > > crtc_state,
> > > > > > i) {
> > > > > >               const struct drm_display_mode *adjusted_mode =
> > > > > >                       &crtc_state->hw.adjusted_mode;
> > > > > >               enum pipe pipe = crtc->pipe;
> > > > > >               int hdisplay, vdisplay;
> > > > > > +             u32 pipe_dbuf_slice_mask = \
> > > > > > +                     i915_get_allowed_dbuf_mask(dev_priv,
> > > > > > +                                             pipe,
> > > > > > +                                             active_pipes,
> > > > > > +                                             crtc_state);
> > > > > >
> > > > > >               if (!crtc_state->hw.enable)
> > > > > >                       continue;
> > > > > >
> > > > > > +             /*
> > > > > > +              * According to BSpec pipe can share one dbuf
> > > > > > slice
> > > > > > with another
> > > > > > +              * pipes or pipe can use multiple dbufs, in
> > > > > > both cases
> > > > > > we
> > > > > > +              * account for other pipes only if they have
> > > > > > exactly
> > > > > > same mask.
> > > > > > +              * However we need to account how many slices
> > > > > > we should
> > > > > > enable
> > > > > > +              * in total.
> > > > > > +              */
> > > > > > +             total_slice_mask |= pipe_dbuf_slice_mask;
> > > > >
> > > > > total_slice_mask will now account only the crtcs in the state.
> > > > > What
> > > > > happens to the other pipes' slices?
> >
> > Same as happens now btw. Right now it also does the same thing.
>
> >Right now it does absolutely nothing. It's all dead code.
>
> Right now, this "dead" code exactly iterates through crtcs in state and
> allocates DBuf S1 in proportion to pipe width.
> What I do is make it do the same thing but using either range of one
> or two slices, depending on pipe configuration.

OK. So how does it deal with my original question?

>
> So should I have thrown away that code and rewritten this as well?

Not if it works.

>
> > > > > > +static uint_fixed_16_16_t
> > > > > > +icl_pipe_downscale_amount(const struct intel_crtc_state
> > > > > > *crtc_state)
> > > > > > +{
> > > > > > +     u32 src_w, src_h, dst_w, dst_h;
> > > > > > +     uint_fixed_16_16_t fp_w_ratio, fp_h_ratio;
> > > > > > +     uint_fixed_16_16_t downscale_h, downscale_w;
> > > > > > +     const struct drm_display_mode *adjusted_mode =
> > > > > > &crtc_state-
> > > > > > > hw.adjusted_mode;
> > > > > >
> > > > > > +
> > > > > > +     src_w = crtc_state->pipe_src_w;
> > > > > > +     src_h = crtc_state->pipe_src_h;
> > > > > > +     dst_w = adjusted_mode->crtc_hdisplay;
> > > > > > +     dst_h = adjusted_mode->crtc_vdisplay;
> > > > > > +
> > > > > > +     fp_w_ratio = div_fixed16(src_w, dst_w);
> > > > > > +     fp_h_ratio = div_fixed16(src_h, dst_h);
> > > > > > +     downscale_w = max_fixed16(fp_w_ratio,
> > > > > > u32_to_fixed16(1));
> > > > > > +     downscale_h = max_fixed16(fp_h_ratio,
> > > > > > u32_to_fixed16(1));
> > > > > > +
> > > > > > +     return mul_fixed16(downscale_w, downscale_h);
> > > > > > +}
> > > > > > +
> > > > > > +uint_fixed_16_16_t
> > > > > > +icl_get_pipe_ratio(const struct intel_crtc_state *crtc_state)
> > > > > > +{
> > > > > > +     struct drm_plane *plane;
> > > > > > +     const struct drm_plane_state *drm_plane_state;
> > > > > > +     uint_fixed_16_16_t pipe_downscale, pipe_ratio;
> > > > > > +     uint_fixed_16_16_t max_downscale = u32_to_fixed16(1);
> > > > > > +     struct intel_crtc *crtc = to_intel_crtc(crtc_state-
> > > > > > >uapi.crtc);
> > > > > > +
> > > > > > +     if (!crtc_state->uapi.enable)
> > > > > > +             return max_downscale;
> > > > > > +
> > > > > > +     drm_atomic_crtc_state_for_each_plane_state(plane,
> > > > > > drm_plane_state, &crtc_state->uapi) {
> > > > > > +             uint_fixed_16_16_t plane_downscale;
> > > > > > +             const struct intel_plane_state *plane_state =
> > > > > > +                     to_intel_plane_state(drm_plane_state);
> > > > > > +
> > > > > > +             if (!intel_wm_plane_visible(crtc_state,
> > > > > > plane_state))
> > > > > > +                     continue;
> > > > > > +
> > > > > > +             plane_downscale =
> > > > > > skl_plane_downscale_amount(crtc_state, plane_state);
> > > > > > +
> > > > > > +             DRM_DEBUG_KMS("Plane %d downscale amount
> > > > > > %d.%d\n",
> > > > > > +                           to_intel_plane(plane)->id,
> > > > > > +                           plane_downscale.val >> 16,
> > > > > > +                           plane_downscale.val & 0xffff);
> > > > > > +
> > > > > > +             max_downscale = max_fixed16(plane_downscale,
> > > > > > max_downscale);
> > > > > > +     }
> > > > > > +
> > > > > > +
> > > > > > +     pipe_downscale = icl_pipe_downscale_amount(crtc_state);
> > > > > > +
> > > > > > +     DRM_DEBUG_KMS("Pipe %d downscale amount %d.%d\n",
> > > > > > +                    crtc->pipe, pipe_downscale.val >> 16,
> > > > > > +                    pipe_downscale.val & 0xffff);
> > > > > > +
> > > > > > +     pipe_downscale = mul_fixed16(pipe_downscale,
> > > > > > max_downscale);
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Convert result to percentage: get the actual ratio
> > > > > > instead
> > > > > > of rate,
> > > > > > +      * then multiply by 100 to get percentage.
> > > > > > +      */
> > > > > > +     pipe_ratio = u32_to_fixed16(100 *
> > > > > > div_round_up_u32_fixed16(1,
> > > > > > pipe_downscale));
> > > > > > +
> > > > > > +     DRM_DEBUG_KMS("Pipe %d ratio %d.%d\n", crtc->pipe,
> > > > > > +                   pipe_ratio.val >> 16, pipe_ratio.val &
> > > > > > 0xffff);
> > > > > > +
> > > > > > +     return pipe_ratio;
> > > > > > +}
> > > > >
> > > > > Yuck. I just killed these.
> > > >
> > > > I still need pipe_ratio fro ICL as you know, this is dictated by
> > > > BSpec,
> > > > not by me.
> > >
> > > That part of the spec is pretty crazy IMO. I think we should just
> > > ignore
> > > it for now. Also nothing in there says we must use both slices for a
> > > single pipe so ignoring doesn't even violate the spec in any way.
> >
> > So you are instructing me now to ignore BSpec as "we are smarter"?
> > I don't think this is properly to do it this way. Lets first change the
> > BSpec, if we don't like pipe ratio here and then driver. Currently this
> > is written that way and if we are going to do everything our own way
> > ignoring specs, this is going to be chaos.
>
> >Like I sai you don't have to violate the spec at all. Just don't use
> >two slices for the same pipe and you can avoid the whole strange pipe
> >ratio limitation. We should file a bspec issue to get that clarified
> >though.
>
> That is actually a violation of BSpec, we are not using two slices for same
> pipe while we could. We had already enough of bw issues, why should
> we make our life even harder?

Using more slices is, for whatever reason, harder for the hardware.
You're not allowed to use two slices if you do any significant
downscaling. Which is a bit sad/odd because increasing downscaling
increases bandwidth utilization but at the same time also prevents
us from using two slices.

I think the main benefit we would get from two slices is better
power savings because we may be able to enable more watermark
levels on account of the larger ddb.

There is nothing in the spec that I can see that says we *must* use
two slices (unless we're about to exceed the CDCDLKx64 bandwidth
limit I guess). And since we haven't been using two slices so far
it seems safer to not jump in the deep end and enable two slices
per-pipe at the same time as we do the other things. Ie. there
should be a separate patch to enable the two slices per pipe usage.

>
> > > > > > +/*
> > > > > > + * Table taken from Bspec 49255
> > > > > > + * Pipes do have some preferred DBuf slice affinity,
> > > > > > + * plus there are some hardcoded requirements on how
> > > > > > + * those should be distributed for multipipe scenarios.
> > > > > > + * For more DBuf slices algorithm can get even more messy
> > > > > > + * and less readable, so decided to use a table almost
> > > > > > + * as is from BSpec itself - that way it is at least easier
> > > > > > + * to compare, change and check.
> > > > > > + */
> > > > > > +struct dbuf_slice_conf_entry tgl_allowed_dbufs[] = {
> > > > > > +{ { 0, 0, 0, 0 }, 0 },
> > > > > > +ICL_PIPE_A_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_B_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_C_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_D_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_CD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ACD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +};
> > > > >
> > > > > My eyes!
> > > > >
> > > > > I have to think we should be able to reduce all that to a handful
> > > > > of lines of code.
> > > >
> > > > Yeah, then we are going to have a huge function with lots of weird
> > > > definitions, unfortunately BSpec table has quite strange DBuf
> > > > assignments like
> > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > >
> > > > i.e slices can get mixed in a quite various ways. There is no
> > > > rational pattern in those and could be even dangerous to try to
> > > > optimize it some way, as one day it might change again in some
> > > > unpredictable way.
> > > >
> > > > Would you prefer to have it like
> > > >
> > > > if (pipes are A and B)
> > > >     program S2 to A and S1 to B
> > > > if (pipes are A and C)
> > > >     program S1 to A and S2 to C
> > > > ...
> > > >
> > > > I would prefer at least to see it in a comparable way with the
> > > > table we have in BSpec, rather than having lots of weird looking
> > > > if-else statements, GEN checks and so on.
> > > >
> > > > I knew this table was not going to look like it is done typically
> > > > here - but really my opinion that it is way better than hardcoding
> > > > it into some weird algorithm, which would be hard to compare to
> > > > initial table, which is already strange enough.
> > > >
> > > > If you don't like it - could you please explain why this is exactly
> > > > worse than having long functions with hardcoded checks?
> > >
> > > I think we should be able to encode this simply using the
> > > "Pipe and DBUF ordering" stuff from the spec. Ie. just specify what
> > > is the distance of each pipe from each dbuf slice. Then all we have
> > > to do is minimize the total distance. The following tables I think
> > > should encode that reasonably concisely:
> > >
> > > /* pipeA - DBUF_S1 - pipeB - pipeC - DBUF_S2 */
> > > static const u8 icl_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S1] = 1, [DBUF_S2] = 4, },
> > >     [PIPE_B] = { [DBUF_S1] = 1, [DBUF_S2] = 2, },
> > >     [PIPE_C] = { [DBUF_S1] = 2, [DBUF_S2] = 1, },
> > > };
> > >
> > > /* pipeD - DBUF_S2 - pipeC - pipeA - DBUF_S1 - pipeB */
> > > static const u8 tgl_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S1] = 1, [DBUF_S2] = 2, },
> > >     [PIPE_B] = { [DBUF_S1] = 1, [DBUF_S2] = 4, },
> > >     [PIPE_C] = { [DBUF_S1] = 2, [DBUF_S2] = 1, },
> > >     [PIPE_D] = { [DBUF_S1] = 4, [DBUF_S2] = 1, },
> > > };
> >
> > I think you are missing my point. I will explain why.
> > My initial idea was similar, use some kind of distance
> > metrics and encode everything in a smart way.
> > Current BSpec table has sometimes slices swapped for no reason.
> >
> > There are seem to be some hardware constraints we are not aware
> > of. For example for TGL pipes A,B get S2 and S1 slices,
>
> >A/S1 + B/S2 = 1 + 4 = 5
> >A/S2 + B/S1 = 2 + 1 = 3
>
> > but pipes A C get S1 and S2.
>
> >A/S1 + C/S2 = 1 + 1 = 2
> >A/S2 + C/S1 = 2 + 2 = 4
>
> >So it's exactly as my distance table says it should be.
>
> >
> > if you encode all this into some kind of distance algorithm,
> > those differences would be enourmously difficult to spot and
> > compare with actual table. That is why I decided to do it that
> > way.
> >
> > Most likely yes you can solve it by using this metrics,
> > but will it really be then more readable and comparable
> > with reference table?
> > What if some contraints will be added again, like pipe ratio?
> >
> > >
> > > /*
> > >  * pipeA - DBUF_S0 - DBUF_S1 - pipeB
> > >  * pipeC - DBUF_S2 - DBUF_S3 - pipeD
> > >  */
> > > static const u8 future_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S0] = 1, [DBUF_S1] = 3, },
> > >     [PIPE_B] = { [DBUF_S0] = 3, [DBUF_S1] = 1, },
> > >     [PIPE_C] = { [DBUF_S2] = 1, [DBUF_S3] = 3, },
> > >     [PIPE_D] = { [DBUF_S2] = 3, [DBUF_S3] = 1, },
> > > };
> > >
> > > I suppose we could make that even more minimal by just
> > > directly sticking that "Pipe and DBUF ordering" language
> > > into a single array but and then calculate each distance
> > > from it, but that is perhaps more work, and we'd need
> > > some kind of enum just for that so that we have a single
> > > namespace for the pipes and dbuf slices. So i have a
> > > feeling having those distances pre-computed in a table
> > > like thet is maybe a better way to go.
> >
> > Good luck then spoting differences with the BSpec table once it
> > gets changed.
>
> >Well, I'd rather do that than decode through layers of cpp macros.
> >Also atm the table has no extra restrictions so it's clearly there
> >just because someone thought they'd write out the results of the
> >distance calculations. In general bspec style is somewhat annoying
> >where they try to write out things in the lowest level of absraction
> >possible (they are hardware people after all). I'd much prefer a
> >higher level description of the issue.
>
> I would prefer to see immediately what is different as this table
> makes it possible right away
> Also to me this table simply intuitively more understandable.
>
> > > > > > +struct dbuf_slice_conf_entry tgl_allowed_dbufs[] = {
> > > > > > +{ { 0, 0, 0, 0 }, 0 },
> > > > > > +ICL_PIPE_A_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_B_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_C_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_D_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_CD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ACD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +};
>
> We have similar CDCLK table in intel_cdclk, while you could
> also argue that we might had come up with some formula.

The cdclk table has one redundancy that I'm not 100% fan of:
it specifies both the cdclk frequency and the cd2x divider.
We could make do with just either one and calculate the other
one.

On the flip side that redundancy would allow us to cross check
that the values produce the expected results. I actually have
a patch for that buried in some branch.

Apart from that I don't see what the cdclk table has to do
with this stuff. We can't derive that cdclk table from anything
much simpler.

Anyways, my preferred approach to this sort of stuff is to
make the computer do the calculations. That's what it's good
at. Humans, not so great at computations, or even copy pasting.
Though as with the cdclk table it may be possible to make
humans do the calculations/copy paste and then have the
computer double check it all makes sense.

--
Ville Syrjälä
Intel