[Mesa-dev] [PATCH 05/22] i965/fs: consider execsize can be duplicated in lower_simd_with

Francisco Jerez currojerez at riseup.net
Tue Jan 10 22:31:12 UTC 2017


"Juan A. Suarez Romero" <jasuarez at igalia.com> writes:

> On Mon, 2017-01-09 at 15:41 -0800, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>> > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>> > 
>> > In IVB/VLV, for instructions dealing with DF, execsize will be
>> > duplicated in the final code.
>> > 
>> > So take this in account when checking if instructions should be
>> > split.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++++++++++++-
>> >  1 file changed, 12 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 78f2124..cfce364 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -4551,8 +4551,15 @@ static unsigned
>> >  get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
>> >                             const fs_inst *inst)
>> >  {
>> > +   /* Note that in IVB/VLV for instructions that handles DF, we
>> > will duplicate
>> > +    * the exec_size. So take this value for calculus purposes.
>> > +    */
>> > +   unsigned exec_size = inst->exec_size;
>> > +   if (devinfo->gen == 7 && !devinfo->is_haswell && inst-
>> > >exec_data_size() == 8)
>> > +      exec_size *= 2;
>> > +
>> 
>> NAK.  This won't have any useful effect because the max_width value
>> doesn't actually have any influence in the regioning restriction
>> calculations below, it's only used to accumulate the intermediate
>> execution size limits derived from each hardware restriction.  I
>> don't
>> think you need to change any of the restrictions below for the result
>> to
>> be correct in presence of exec size doubling.
>
>
>
> Yes, you're right.
>
> Actually the motivation for the patch was to avoid ending up with a
> final exec_size > 8  (after doubling it in the generator) when we are
> generating a SIMD8 shader (or exec_size > 16 in SIMD16 shader).
>
> So if we have a exec_size == 8, and we are generating SIMD8 code, then
> we want to split in two instructions of exec_size==4, because at the
> end we will double the exec_size to 8. Same reasoning with SIMD16 code.
>
You're fine with such execution sizes as long asyou don't violate the 2
GRFs per source or destination region hardware limit (and a few other
crazy restrictions) which are only affected by the *effective* execution
size of the instruction and don't care at all about the units the
execution size is expressed in at the machine code level (e.g. the total
amount of data accessed by a source region of the instruction is the
same regardless of whether you consider it to be a sequence of N
elements of size S or a sequence of 2*N elements of size S/2).

>
> This could be done as:
>
> @@ -5119,7 +5119,11 @@ fs_visitor::lower_simd_width()
>     bool progress = false;
>  
>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> -      const unsigned lower_width = get_lowered_simd_width(devinfo,
> inst);
> +      unsigned lower_width = get_lowered_simd_width(devinfo, inst);
> +
> +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +          (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) ==
> 8))
> +         lower_width = MIN2(lower_width, dispatch_width / 2);
>

That wouldn't do any good...  Let's just drop this patch.

>
> Now, this is all in theory. So far, no test has exposed this
> situation with this change of top of our branch, and the lower_width
> returned by get_lowered_simd_width() is always less than 8. But maybe
> this could happen in future, for instance, if we emit an instruction
> that writes more than 4 DF and has force_writemask_all == TRUE.
>
>> 
>> >     /* Maximum execution size representable in the instruction controls. */
>> > -   unsigned max_width = MIN2(32, inst->exec_size);
>> > +   unsigned max_width = MIN2(32, exec_size);
>> >  
>> >     /* According to the PRMs:
>> >      *  "A. In Direct Addressing mode, a source cannot span more than 2
>> > @@ -4656,6 +4663,10 @@ get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
>> >           max_width = MIN2(max_width, channels_per_grf);
>> >     }
>> >  
>> > +   /* If we have duplicated exec_size, then readjust max_width if required. */
>> > +   if (exec_size != inst->exec_size && max_width == exec_size)
>> > +      max_width = inst->exec_size;
>> > +
>> >     /* Only power-of-two execution sizes are representable in the instruction
>> >      * control fields.
>> >      */
>> > -- 
>> > 2.9.3
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> 
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170110/4a1f1e6f/attachment.sig>


More information about the mesa-dev mailing list