[Intel-gfx] [PATCH v7] drm/i915: Enable second dbuf slice for ICL and TGL

Ville Syrjälä ville.syrjala at linux.intel.com
Wed Nov 20 19:18:37 UTC 2019


On Mon, Nov 18, 2019 at 08:25:48PM +0000, Lisovskiy, Stanislav wrote:
> Hi,
> 
> > > > > >
> > > > > > -     /* If 2nd DBuf slice is no more required disable it */
> > > > > > -     if (INTEL_GEN(dev_priv) >= 11 && required_slices <
> > > > > > hw_enabled_slices)
> > > > > > -             icl_dbuf_slices_update(dev_priv,
> > > > > > required_slices);
> > > > > > +     /*
> > > > > > +      * If some other DBuf slice is not needed, disable it
> > > > > > here,
> > > > > > +      * as here we only remove slices, which are not needed
> > > > > > anymore.
> > > > > > +      */
> > > > > > +     if (INTEL_GEN(dev_priv) >= 11 && required_slices !=
> > > > > > hw_enabled_slices)
> > > > > > +             icl_dbuf_slices_update(dev_priv,
> > > > > > slices_intersection);
> > > > >
> > > > > This is still a totally wrong place for this. We need to be sure
> > > > > the
> > > > > planes have switched over to the new ddb layout before we power
> > > > > off
> > > > > anything.
> > > >
> > > > I have just modified already existing line here, not added by me.
> > > > The point of this patch is starting to use second DBuf slice but
> > > > not fixing all kinds of issues related to DBuf code at the same
> > > > time.
> > >
> > > You're re-enabling code I disabled on purpose. It has to be fixed
> > > properly or not touched at all. In hindsight I should have probably
> > > just ripped it all out to not give people bad ideas.
> >
> > -     /* If 2nd DBuf slice is no more required disable it */
> > > > > > -     if (INTEL_GEN(dev_priv) >= 11 && required_slices <
> > > > > > hw_enabled_slices)
> > > > > > -             icl_dbuf_slices_update(dev_priv,
> > > > > > required_slices);
> >
> > I'm not reenabling it - this is code is out there right now.
> 
> > It's _dead_ code, and it's like that on purpose. You're now attempting
> > to re-enable it without fixing the underlying issues.
> 
> Sounds like some sacred thing. Dead code which is not commented, neither
> removed, but which you can't touch.  

You can touch it by either removing it or fixing it. Anything else
is a bit pointless.

> 
> > This is all again about changing all kinds of stuff at the same time.
> >
> > Why we can't just do things step by step? You have a point that
> > this is wrong place ok, but as we see current code causes no
> > regressions. Just tell me your idea where to put it and I will
> > do that, if you don't have it yet - let it stay there, if CI shows
> > no regressions currently.
> 
> >The same global state tricks apply here as with the cdclk/sagv stuff.
> >1. enable new resources/bump up clocks
> >2. reconfigure planes/etc.
> >3. wait for vblanks
> >4. disable unused resources/drop the clocks
> 
> It is different with SAGV, there we restrict whats needed first and relax
> some limitations afterwards.

It's a global resource shared by the pipes. Hence it's quite similar
to SAGV and cdclk.

> So all this was simply about that I should put this function call into
> skl_commit_modeset_disables? 

It should be in intel_atomic_commit_tail() most likely, just like cdclk.

I cooked up a small test that I believe should end up ping-ponging
between one and two slices on icl, and so should end up exploding
when the second slice gets shut down prematurely:
git://github.com/vsyrjala/intel-gpu-tools.git dbuf_slice_test

> I could just have said it straight away to me, without mentioned "dead" code
> and other stuff..

I don't read minds either.

> 
> > > > > > +     return INTEL_INFO(dev_priv)->supported_slices;
> > > > > > +}
> > > > >
> > > > > Kinda pointless function now. 'supported_slices' is not a very
> > > > > good
> > > > > name. Should be 'num_dbuf_slices' or something along those lines.
> > > >
> > > > Supported slices really means how much slices we support in
> > > > hardware.
> > > > num_dbuf_slices will not reflect that, it will be more ambigious.
> > > > I don't like this name.
> > >
> > > supported_slices doesn't even say "dbuf" anywhere. So it's even more
> > > ambiguous. Also we already have num_pipes, num_sprites, etc.
> >
>  ..and in current code we have... "enabled_slices", no? Ouch! :) IS this
>  really that important and worth arguing at the very last moment?
>  whether number of
>  supported_slices_in_hardware, should be named "supported slices" or
>  "num_dbuf_slices"? If yes, lets go on, miss some targets and continue
>  discussing this "soooo important" thing.
> 
>  Some one might come up that it is called "num_slices" or
>  "dbuf_num_slices". Seriously, may be "supported_num_dbuf_slices" to
>  keep everyones bloated egos happy??

Since we have ddb_size in device info already it should be next to that
actually. And we should probably unify the terminology a bit so we don't
mix dbuf and ddb in the same sentence.

> 
> > > > > >  static void icl_dbuf_enable(struct drm_i915_private *dev_priv)
> > > > > >  {
> > > > > > -     I915_WRITE(DBUF_CTL_S1, I915_READ(DBUF_CTL_S1) |
> > > > > > DBUF_POWER_REQUEST);
> > > > > > -     I915_WRITE(DBUF_CTL_S2, I915_READ(DBUF_CTL_S2) |
> > > > > > DBUF_POWER_REQUEST);
> > > > > > -     POSTING_READ(DBUF_CTL_S2);
> > > > > > -
> > > > > > -     udelay(10);
> > > > > > -
> > > > > > -     if (!(I915_READ(DBUF_CTL_S1) & DBUF_POWER_STATE) ||
> > > > > > -         !(I915_READ(DBUF_CTL_S2) & DBUF_POWER_STATE))
> > > > > > -             DRM_ERROR("DBuf power enable timeout\n");
> > > > > > -     else
> > > > > > -             /*
> > > > > > -              * FIXME: for now pretend that we only have 1
> > > > > > slice,
> > > > > > see
> > > > > > -              * intel_enabled_dbuf_slices_num().
> > > > > > -              */
> > > > > > -             dev_priv->wm.skl_hw.ddb.enabled_slices = 1;
> > > > > > +     /*
> > > > > > +      * Just power up 1 slice, we will
> > > > > > +      * figure out later which slices we have and what we
> > > > > > need.
> > > > > > +      */
> > > > > > +     dev_priv->wm.skl_hw.ddb.enabled_slices = DBUF_S1_BIT;
> > > > >
> > > > > That stuff really shouldn't be modified by low level functions.
> > > > > The intial value should come from readout, and after that it
> > > > > should
> > > > > be adjusted as part of the overall atomic state.
> > > >
> > > > Well, as you can see currently this is even worse - has some weird
> > > > hardcoded stuff and is modified like that. Also if we are enabling
> > > > the
> > > > dbuf slices, shouldn't we really determine what value should be
> > > > there?
> > > >
> > > > Also I wouldn't really always trust the readout, as to me it seems
> > > > icelake resets it constantly to 2 after suspend/resume.
> > >
> > > The point of readout is that we syncrhonize our software state with
> > > the
> > > current state of hardware without having to modify the hardware
> > > state.
> > > Once synchronized the driver takes over and tells the hw what to do.
> >
> > That is especially nice after you say "report error and don't care"
> > when I really want to reflect hw state in sw state, by checking whether
> > we could set dbuf slices or not.
> > Now you say that we need to synchronize software state with hw state,
> > however if I fail to modify it - then "who cares".
> 
> >Yes. If you fail to modify the hardware state the whole thing is
> >broken anyway. There's is *nothing* we can do at this stage in the
> >commit to fix it.
> 
> Ok, but why we can't at least keep the last successful state in sw
> instead of something which is completely wrong. How does that harm?

It messes up the software state that we computed with something that
we didn't compute. So going forward the state will be inconsistent.

The core concept of atomic is that the state is precomputed and
validated ahead of time. The commit phase is supposed to be dumb and
isn't allowed to modify the state. This is also why we often pass
the states as const during the commit phase, though unfortunately
we can't do that always on account of the states being slightly
overloaded with other metadata.

> 
> > > > > >
> > > > > >  struct skl_ddb_allocation {
> > > > >
> > > > > This struct is quite pointless. Should be killed off.
> > > >
> > > > Legacy stuff, not added by me. Quite fine with removing it -
> > > > but should we really do all this at the same time and in same
> > > > patch?
> > >
> > > Not the same patch. But clean up the cobwebs first, then add the new
> > > stuff. Much easier that way around.
> >
> > There is a shit ton load of cobwebs to clean. Personally my opinion is
> > that current dbuf controlling code is total garbage - which has lots of
> > hardcoded stuff. Good luck enabling only second dbuf slice, good luck
> > disabling/enabling dbuf properly....
> > Where you were, when this was committed then?
> 
> >Looking at something else unfortunately. Also it has just kept
> >bitrotting since because no one has cared.
> 
> That is what exactly surprises me. While we have issues
> like commit serialization, sagv, mst, half of display buffer only
> enabled, bw, and we are arguing about how supported_slices
> should be named. Or that I shouldn't refer to DBuf slices as bits
> but define as values and then decode those as bits. Priorities.

Usually people complain about reviewers not pointing out
all the issues in one sitting. 

> 
> >
> > If we need to fix it - fine. But may be we should do it in stages?
> > Point of this patch is "enable second dbuf slice", next patch "remove
> > unneeded structs" and so on.
> > While you want me to shuffle everything at the same time, which is not
> > good idea, especially considering complexity and amount of legacy stuff
> > we currently have.
> 
> >Building new stuff on top of a decomposing garbage pile is not
> >a good idea: a) it's harder than building on clean foundation,
> >b) it'll probably fall over and get buried in the old garbage
> >at which point you're likely going to be right back where you
> >started
> .
> Sure, let's change everything at the same time and then wonder
> what had failed exactly in that huge series with lots of unrelated changes. 
> Standard practice.

Series can be split up to smaller chunks. My usual MO is to split
prep stuff out to separate patches, and if it looks like the overall
series is getting too big I even post the prep patches separately.
In fact I often end up with more than one prep series for whatever
actual goal I was trying to reach.

> 
> > > > >
> > > > > >       for_each_new_intel_crtc_in_state(intel_state, crtc,
> > > > > > crtc_state,
> > > > > > i) {
> > > > > >               const struct drm_display_mode *adjusted_mode =
> > > > > >                       &crtc_state->hw.adjusted_mode;
> > > > > >               enum pipe pipe = crtc->pipe;
> > > > > >               int hdisplay, vdisplay;
> > > > > > +             u32 pipe_dbuf_slice_mask = \
> > > > > > +                     i915_get_allowed_dbuf_mask(dev_priv,
> > > > > > +                                             pipe,
> > > > > > +                                             active_pipes,
> > > > > > +                                             crtc_state);
> > > > > >
> > > > > >               if (!crtc_state->hw.enable)
> > > > > >                       continue;
> > > > > >
> > > > > > +             /*
> > > > > > +              * According to BSpec pipe can share one dbuf
> > > > > > slice
> > > > > > with another
> > > > > > +              * pipes or pipe can use multiple dbufs, in
> > > > > > both cases
> > > > > > we
> > > > > > +              * account for other pipes only if they have
> > > > > > exactly
> > > > > > same mask.
> > > > > > +              * However we need to account how many slices
> > > > > > we should
> > > > > > enable
> > > > > > +              * in total.
> > > > > > +              */
> > > > > > +             total_slice_mask |= pipe_dbuf_slice_mask;
> > > > >
> > > > > total_slice_mask will now account only the crtcs in the state.
> > > > > What
> > > > > happens to the other pipes' slices?
> >
> > Same as happens now btw. Right now it also does the same thing.
> 
> >Right now it does absolutely nothing. It's all dead code.
> 
> Right now, this "dead" code exactly iterates through crtcs in state and
> allocates DBuf S1 in proportion to pipe width.
> What I do is make it do the same thing but using either range of one
> or two slices, depending on pipe configuration.

OK. So how does it deal with my original question?

> 
> So should I have thrown away that code and rewritten this as well?

Not if it works.

> 
> > > > > > +static uint_fixed_16_16_t
> > > > > > +icl_pipe_downscale_amount(const struct intel_crtc_state
> > > > > > *crtc_state)
> > > > > > +{
> > > > > > +     u32 src_w, src_h, dst_w, dst_h;
> > > > > > +     uint_fixed_16_16_t fp_w_ratio, fp_h_ratio;
> > > > > > +     uint_fixed_16_16_t downscale_h, downscale_w;
> > > > > > +     const struct drm_display_mode *adjusted_mode =
> > > > > > &crtc_state-
> > > > > > > hw.adjusted_mode;
> > > > > >
> > > > > > +
> > > > > > +     src_w = crtc_state->pipe_src_w;
> > > > > > +     src_h = crtc_state->pipe_src_h;
> > > > > > +     dst_w = adjusted_mode->crtc_hdisplay;
> > > > > > +     dst_h = adjusted_mode->crtc_vdisplay;
> > > > > > +
> > > > > > +     fp_w_ratio = div_fixed16(src_w, dst_w);
> > > > > > +     fp_h_ratio = div_fixed16(src_h, dst_h);
> > > > > > +     downscale_w = max_fixed16(fp_w_ratio,
> > > > > > u32_to_fixed16(1));
> > > > > > +     downscale_h = max_fixed16(fp_h_ratio,
> > > > > > u32_to_fixed16(1));
> > > > > > +
> > > > > > +     return mul_fixed16(downscale_w, downscale_h);
> > > > > > +}
> > > > > > +
> > > > > > +uint_fixed_16_16_t
> > > > > > +icl_get_pipe_ratio(const struct intel_crtc_state *crtc_state)
> > > > > > +{
> > > > > > +     struct drm_plane *plane;
> > > > > > +     const struct drm_plane_state *drm_plane_state;
> > > > > > +     uint_fixed_16_16_t pipe_downscale, pipe_ratio;
> > > > > > +     uint_fixed_16_16_t max_downscale = u32_to_fixed16(1);
> > > > > > +     struct intel_crtc *crtc = to_intel_crtc(crtc_state-
> > > > > > >uapi.crtc);
> > > > > > +
> > > > > > +     if (!crtc_state->uapi.enable)
> > > > > > +             return max_downscale;
> > > > > > +
> > > > > > +     drm_atomic_crtc_state_for_each_plane_state(plane,
> > > > > > drm_plane_state, &crtc_state->uapi) {
> > > > > > +             uint_fixed_16_16_t plane_downscale;
> > > > > > +             const struct intel_plane_state *plane_state =
> > > > > > +                     to_intel_plane_state(drm_plane_state);
> > > > > > +
> > > > > > +             if (!intel_wm_plane_visible(crtc_state,
> > > > > > plane_state))
> > > > > > +                     continue;
> > > > > > +
> > > > > > +             plane_downscale =
> > > > > > skl_plane_downscale_amount(crtc_state, plane_state);
> > > > > > +
> > > > > > +             DRM_DEBUG_KMS("Plane %d downscale amount
> > > > > > %d.%d\n",
> > > > > > +                           to_intel_plane(plane)->id,
> > > > > > +                           plane_downscale.val >> 16,
> > > > > > +                           plane_downscale.val & 0xffff);
> > > > > > +
> > > > > > +             max_downscale = max_fixed16(plane_downscale,
> > > > > > max_downscale);
> > > > > > +     }
> > > > > > +
> > > > > > +
> > > > > > +     pipe_downscale = icl_pipe_downscale_amount(crtc_state);
> > > > > > +
> > > > > > +     DRM_DEBUG_KMS("Pipe %d downscale amount %d.%d\n",
> > > > > > +                    crtc->pipe, pipe_downscale.val >> 16,
> > > > > > +                    pipe_downscale.val & 0xffff);
> > > > > > +
> > > > > > +     pipe_downscale = mul_fixed16(pipe_downscale,
> > > > > > max_downscale);
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Convert result to percentage: get the actual ratio
> > > > > > instead
> > > > > > of rate,
> > > > > > +      * then multiply by 100 to get percentage.
> > > > > > +      */
> > > > > > +     pipe_ratio = u32_to_fixed16(100 *
> > > > > > div_round_up_u32_fixed16(1,
> > > > > > pipe_downscale));
> > > > > > +
> > > > > > +     DRM_DEBUG_KMS("Pipe %d ratio %d.%d\n", crtc->pipe,
> > > > > > +                   pipe_ratio.val >> 16, pipe_ratio.val &
> > > > > > 0xffff);
> > > > > > +
> > > > > > +     return pipe_ratio;
> > > > > > +}
> > > > >
> > > > > Yuck. I just killed these.
> > > >
> > > > I still need pipe_ratio fro ICL as you know, this is dictated by
> > > > BSpec,
> > > > not by me.
> > >
> > > That part of the spec is pretty crazy IMO. I think we should just
> > > ignore
> > > it for now. Also nothing in there says we must use both slices for a
> > > single pipe so ignoring doesn't even violate the spec in any way.
> >
> > So you are instructing me now to ignore BSpec as "we are smarter"?
> > I don't think this is properly to do it this way. Lets first change the
> > BSpec, if we don't like pipe ratio here and then driver. Currently this
> > is written that way and if we are going to do everything our own way
> > ignoring specs, this is going to be chaos.
> 
> >Like I sai you don't have to violate the spec at all. Just don't use
> >two slices for the same pipe and you can avoid the whole strange pipe
> >ratio limitation. We should file a bspec issue to get that clarified
> >though.
> 
> That is actually a violation of BSpec, we are not using two slices for same
> pipe while we could. We had already enough of bw issues, why should
> we make our life even harder?

Using more slices is, for whatever reason, harder for the hardware.
You're not allowed to use two slices if you do any significant
downscaling. Which is a bit sad/odd because increasing downscaling
increases bandwidth utilization but at the same time also prevents
us from using two slices.

I think the main benefit we would get from two slices is better
power savings because we may be able to enable more watermark
levels on account of the larger ddb.

There is nothing in the spec that I can see that says we *must* use
two slices (unless we're about to exceed the CDCDLKx64 bandwidth
limit I guess). And since we haven't been using two slices so far
it seems safer to not jump in the deep end and enable two slices
per-pipe at the same time as we do the other things. Ie. there
should be a separate patch to enable the two slices per pipe usage.

> 
> > > > > > +/*
> > > > > > + * Table taken from Bspec 49255
> > > > > > + * Pipes do have some preferred DBuf slice affinity,
> > > > > > + * plus there are some hardcoded requirements on how
> > > > > > + * those should be distributed for multipipe scenarios.
> > > > > > + * For more DBuf slices algorithm can get even more messy
> > > > > > + * and less readable, so decided to use a table almost
> > > > > > + * as is from BSpec itself - that way it is at least easier
> > > > > > + * to compare, change and check.
> > > > > > + */
> > > > > > +struct dbuf_slice_conf_entry tgl_allowed_dbufs[] = {
> > > > > > +{ { 0, 0, 0, 0 }, 0 },
> > > > > > +ICL_PIPE_A_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_B_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_C_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_D_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_CD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ACD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +};
> > > > >
> > > > > My eyes!
> > > > >
> > > > > I have to think we should be able to reduce all that to a handful
> > > > > of lines of code.
> > > >
> > > > Yeah, then we are going to have a huge function with lots of weird
> > > > definitions, unfortunately BSpec table has quite strange DBuf
> > > > assignments like
> > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > >
> > > > i.e slices can get mixed in a quite various ways. There is no
> > > > rational pattern in those and could be even dangerous to try to
> > > > optimize it some way, as one day it might change again in some
> > > > unpredictable way.
> > > >
> > > > Would you prefer to have it like
> > > >
> > > > if (pipes are A and B)
> > > >     program S2 to A and S1 to B
> > > > if (pipes are A and C)
> > > >     program S1 to A and S2 to C
> > > > ...
> > > >
> > > > I would prefer at least to see it in a comparable way with the
> > > > table we have in BSpec, rather than having lots of weird looking
> > > > if-else statements, GEN checks and so on.
> > > >
> > > > I knew this table was not going to look like it is done typically
> > > > here - but really my opinion that it is way better than hardcoding
> > > > it into some weird algorithm, which would be hard to compare to
> > > > initial table, which is already strange enough.
> > > >
> > > > If you don't like it - could you please explain why this is exactly
> > > > worse than having long functions with hardcoded checks?
> > >
> > > I think we should be able to encode this simply using the
> > > "Pipe and DBUF ordering" stuff from the spec. Ie. just specify what
> > > is the distance of each pipe from each dbuf slice. Then all we have
> > > to do is minimize the total distance. The following tables I think
> > > should encode that reasonably concisely:
> > >
> > > /* pipeA - DBUF_S1 - pipeB - pipeC - DBUF_S2 */
> > > static const u8 icl_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S1] = 1, [DBUF_S2] = 4, },
> > >     [PIPE_B] = { [DBUF_S1] = 1, [DBUF_S2] = 2, },
> > >     [PIPE_C] = { [DBUF_S1] = 2, [DBUF_S2] = 1, },
> > > };
> > >
> > > /* pipeD - DBUF_S2 - pipeC - pipeA - DBUF_S1 - pipeB */
> > > static const u8 tgl_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S1] = 1, [DBUF_S2] = 2, },
> > >     [PIPE_B] = { [DBUF_S1] = 1, [DBUF_S2] = 4, },
> > >     [PIPE_C] = { [DBUF_S1] = 2, [DBUF_S2] = 1, },
> > >     [PIPE_D] = { [DBUF_S1] = 4, [DBUF_S2] = 1, },
> > > };
> >
> > I think you are missing my point. I will explain why.
> > My initial idea was similar, use some kind of distance
> > metrics and encode everything in a smart way.
> > Current BSpec table has sometimes slices swapped for no reason.
> >
> > There are seem to be some hardware constraints we are not aware
> > of. For example for TGL pipes A,B get S2 and S1 slices,
> 
> >A/S1 + B/S2 = 1 + 4 = 5
> >A/S2 + B/S1 = 2 + 1 = 3
> 
> > but pipes A C get S1 and S2.
> 
> >A/S1 + C/S2 = 1 + 1 = 2
> >A/S2 + C/S1 = 2 + 2 = 4
> 
> >So it's exactly as my distance table says it should be.
> 
> >
> > if you encode all this into some kind of distance algorithm,
> > those differences would be enourmously difficult to spot and
> > compare with actual table. That is why I decided to do it that
> > way.
> >
> > Most likely yes you can solve it by using this metrics,
> > but will it really be then more readable and comparable
> > with reference table?
> > What if some contraints will be added again, like pipe ratio?
> >
> > >
> > > /*
> > >  * pipeA - DBUF_S0 - DBUF_S1 - pipeB
> > >  * pipeC - DBUF_S2 - DBUF_S3 - pipeD
> > >  */
> > > static const u8 future_dbuf_distance[][] {
> > >     [PIPE_A] = { [DBUF_S0] = 1, [DBUF_S1] = 3, },
> > >     [PIPE_B] = { [DBUF_S0] = 3, [DBUF_S1] = 1, },
> > >     [PIPE_C] = { [DBUF_S2] = 1, [DBUF_S3] = 3, },
> > >     [PIPE_D] = { [DBUF_S2] = 3, [DBUF_S3] = 1, },
> > > };
> > >
> > > I suppose we could make that even more minimal by just
> > > directly sticking that "Pipe and DBUF ordering" language
> > > into a single array but and then calculate each distance
> > > from it, but that is perhaps more work, and we'd need
> > > some kind of enum just for that so that we have a single
> > > namespace for the pipes and dbuf slices. So i have a
> > > feeling having those distances pre-computed in a table
> > > like thet is maybe a better way to go.
> >
> > Good luck then spoting differences with the BSpec table once it
> > gets changed.
> 
> >Well, I'd rather do that than decode through layers of cpp macros.
> >Also atm the table has no extra restrictions so it's clearly there
> >just because someone thought they'd write out the results of the
> >distance calculations. In general bspec style is somewhat annoying
> >where they try to write out things in the lowest level of absraction
> >possible (they are hardware people after all). I'd much prefer a
> >higher level description of the issue.
> 
> I would prefer to see immediately what is different as this table
> makes it possible right away
> Also to me this table simply intuitively more understandable.
> 
> > > > > > +struct dbuf_slice_conf_entry tgl_allowed_dbufs[] = {
> > > > > > +{ { 0, 0, 0, 0 }, 0 },
> > > > > > +ICL_PIPE_A_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_B_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_C_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_D_DBUF_SLICES(DBUF_S1_BIT | DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AB_DBUF_SLICES(DBUF_S2_BIT, DBUF_S1_BIT),
> > > > > > +ICL_PIPE_AC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_AD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_CD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABC_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ACD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_BCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +ICL_PIPE_ABCD_DBUF_SLICES(DBUF_S1_BIT, DBUF_S1_BIT,
> > > > > > DBUF_S2_BIT,
> > > > > > DBUF_S2_BIT),
> > > > > > +};
> 
> We have similar CDCLK table in intel_cdclk, while you could 
> also argue that we might had come up with some formula.

The cdclk table has one redundancy that I'm not 100% fan of:
it specifies both the cdclk frequency and the cd2x divider.
We could make do with just either one and calculate the other
one.

On the flip side that redundancy would allow us to cross check
that the values produce the expected results. I actually have
a patch for that buried in some branch.

Apart from that I don't see what the cdclk table has to do
with this stuff. We can't derive that cdclk table from anything
much simpler.

Anyways, my preferred approach to this sort of stuff is to
make the computer do the calculations. That's what it's good
at. Humans, not so great at computations, or even copy pasting.
Though as with the cdclk table it may be possible to make
humans do the calculations/copy paste and then have the
computer double check it all makes sense.

-- 
Ville Syrjälä
Intel


More information about the Intel-gfx mailing list