[Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

Thu Jan 17 10:15:28 UTC 2019

Jason Ekstrand <jason at jlekstrand.net> writes:

> The pass was discovered to cause problems with the MUL+MACH combination
> we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
> into issues where the MUL+MACH ended up using a strided source due to
> working on half of a uint64_t and the new lowering pass helpfully tried
> to fix the multiply which wrote to an unstriated accumulator.

> Not only did the multiply not need to be fixed

That's far from clear, and inconsistent with what this patch is doing,
since the fix is still being applied (Wouldn't it make sense to clarify
that in the commit message since it's slightly misleading about it?).

The original instruction was technically violating the first CHV/BXT
double-precision regioning restriction before the pass was introduced,
that's why it made any changes in the first place.  The integer
multiplication lowering code was just lucky enough that violating the
restriction didn't matter in this case, but I doubt that the reason for
that had anything to do with the accumulator being the explicit
destination...

> but the "fix" ended up breaking it because a MOV to the accumulator is
> not the same as using it as a multiply destination due to the magic
> way the 33/64 bits of the

Technically it has 66 bits (it wasn't a typo when I said that to you
earlier on IRC).  That's how it can t hold the result of a SIMD16
16x16-bit integer multiplication with 33-bit signed precision per scalar
component.

> accumulator are handled for different instruction types.
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez <currojerez at riseup.net>
> ---
>  src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index cc4163b4c2c..b8a89e82272 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -53,7 +53,13 @@ namespace {
>     unsigned
>     required_dst_byte_stride(const fs_inst *inst)
>     {
> -      if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> +      if (inst->dst.is_accumulator()) {
> +         /* Even though it's not explicitly documented in the PRMs or the
> +          * BSpec, writes to the accumulator appear to not need any special
> +          * treatment with respect too their destination stride alignment.
> +          */

The code is not really doing what the comment says.  The
destination/source stride alignment restriction will still be honored
for this instruction.  It's just that the destination *has* to be left
untouched while doing that in the case of an integer MUL/MACH
instruction (that's the only reason I asked you to return the original
byte stride of the destination), because splitting off the region into a
MOV would lead to data loss due to the inconsistent semantics of the
accumulator destination for integer MUL/MACH (which update the whole 66
bits) and every other integer arithmetic instruction (which update the
bottom 33 bits and *apparently* leave the top 33 bits uninitialized) --
IOW this is only here so that the assert below doesn't fire.

> +         return inst->dst.stride * type_sz(inst->dst.type);
> +      } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&

The code changes themselves are just as I wished, so this gets my:

Reviewed-by: Francisco Jerez <currojerez at riseup.net>

assuming that you clarify the commit message and comment above.

>            !is_byte_raw_mov(inst)) {
>           return get_exec_type_size(inst);
>        } else {
> @@ -316,6 +322,14 @@ namespace {
>     bool
>     lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
>     {
> +      /* We cannot replace the result of an integer multiply which writes the
> +       * accumulator because MUL+MACH pairs act on the accumulator as a 64-bit
> +       * value whereas the MOV will act on only 32 or 33 bits of the
> +       * accumulator.
> +       */
> +      assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
> +             brw_reg_type_is_floating_point(inst->dst.type));
> +
>        const fs_builder ibld(v, block, inst);
>        const unsigned stride = required_dst_byte_stride(inst) /
>                                type_sz(inst->dst.type);
> -- 
> 2.20.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190117/54a5423f/attachment.sig>