[Mesa-dev] [PATCH 05/22] i965/fs: consider execsize can be duplicated in lower_simd_with
Juan A. Suarez Romero
jasuarez at igalia.com
Wed Jan 11 08:50:31 UTC 2017
On Tue, 2017-01-10 at 14:31 -0800, Francisco Jerez wrote:
> "Juan A. Suarez Romero" <jasuarez at igalia.com> writes:
>
> > On Mon, 2017-01-09 at 15:41 -0800, Francisco Jerez wrote:
> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > >
> > > > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > > >
> > > > In IVB/VLV, for instructions dealing with DF, execsize will be
> > > > duplicated in the final code.
> > > >
> > > > So take this in account when checking if instructions should be
> > > > split.
> > > > ---
> > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++++++++++++-
> > > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > > index 78f2124..cfce364 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > > @@ -4551,8 +4551,15 @@ static unsigned
> > > > get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
> > > > const fs_inst *inst)
> > > > {
> > > > + /* Note that in IVB/VLV for instructions that handles DF, we
> > > > will duplicate
> > > > + * the exec_size. So take this value for calculus purposes.
> > > > + */
> > > > + unsigned exec_size = inst->exec_size;
> > > > + if (devinfo->gen == 7 && !devinfo->is_haswell && inst-
> > > > > exec_data_size() == 8)
> > > >
> > > > + exec_size *= 2;
> > > > +
> > >
> > > NAK. This won't have any useful effect because the max_width value
> > > doesn't actually have any influence in the regioning restriction
> > > calculations below, it's only used to accumulate the intermediate
> > > execution size limits derived from each hardware restriction. I
> > > don't
> > > think you need to change any of the restrictions below for the result
> > > to
> > > be correct in presence of exec size doubling.
> >
> >
> >
> > Yes, you're right.
> >
> > Actually the motivation for the patch was to avoid ending up with a
> > final exec_size > 8 (after doubling it in the generator) when we are
> > generating a SIMD8 shader (or exec_size > 16 in SIMD16 shader).
> >
> > So if we have a exec_size == 8, and we are generating SIMD8 code, then
> > we want to split in two instructions of exec_size==4, because at the
> > end we will double the exec_size to 8. Same reasoning with SIMD16 code.
> >
>
> You're fine with such execution sizes as long asyou don't violate the 2
> GRFs per source or destination region hardware limit (and a few other
> crazy restrictions) which are only affected by the *effective* execution
> size of the instruction and don't care at all about the units the
> execution size is expressed in at the machine code level (e.g. the total
> amount of data accessed by a source region of the instruction is the
> same regardless of whether you consider it to be a sequence of N
> elements of size S or a sequence of 2*N elements of size S/2).
>
OK, I get it. Then, let's drop this patch.
Thanks for the feedback.
> >
> > This could be done as:
> >
> > @@ -5119,7 +5119,11 @@ fs_visitor::lower_simd_width()
> > bool progress = false;
> >
> > foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> > - const unsigned lower_width = get_lowered_simd_width(devinfo,
> > inst);
> > + unsigned lower_width = get_lowered_simd_width(devinfo, inst);
> > +
> > + if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > + (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) ==
> > 8))
> > + lower_width = MIN2(lower_width, dispatch_width / 2);
> >
>
> That wouldn't do any good... Let's just drop this patch.
>
OK, let's drop the entire patch.
> >
> > Now, this is all in theory. So far, no test has exposed this
> > situation with this change of top of our branch, and the lower_width
> > returned by get_lowered_simd_width() is always less than 8. But maybe
> > this could happen in future, for instance, if we emit an instruction
> > that writes more than 4 DF and has force_writemask_all == TRUE.
> >
> > >
> > > > /* Maximum execution size representable in the instruction controls. */
> > > > - unsigned max_width = MIN2(32, inst->exec_size);
> > > > + unsigned max_width = MIN2(32, exec_size);
> > > >
> > > > /* According to the PRMs:
> > > > * "A. In Direct Addressing mode, a source cannot span more than 2
> > > > @@ -4656,6 +4663,10 @@ get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
> > > > max_width = MIN2(max_width, channels_per_grf);
> > > > }
> > > >
> > > > + /* If we have duplicated exec_size, then readjust max_width if required. */
> > > > + if (exec_size != inst->exec_size && max_width == exec_size)
> > > > + max_width = inst->exec_size;
> > > > +
> > > > /* Only power-of-two execution sizes are representable in the instruction
> > > > * control fields.
> > > > */
> > > > --
> > > > 2.9.3
> > > >
More information about the mesa-dev
mailing list