[Mesa-dev] [PATCH v3 20/24] i965/vec4: Fix exec size for MOVs SET_{HIGH, LOW}_32BIT.

Tue Mar 7 21:38:58 UTC 2017

Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> On 04/03/17 01:26, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>>> From: Matt Turner <mattst88 at gmail.com>
>>>
>>> Otherwise for a pack_double_2x32_split opcode, we emit:
>>>
>>>    vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
>>> mov(8)          g5<1>UD         g5<4>.xUD                       { align16 1Q compacted };
>>> mov(8)          g7<2>UD         g5<4,4,1>UD                     { align1 1Q };
>>>         ERROR: When the destination spans two registers, the source must span two registers
>>>                (exceptions for scalar source and packed-word to packed-dword expansion)
>>> mov(8)          g8<2>UD         g5.4<4,4,1>UD                   { align1 2N };
>>>         ERROR: The offset from the two source registers must be the same
>>> mov(8)          g5<1>UD         g6<4>.xUD                       { align16 1Q compacted };
>>> mov(8)          g7.1<2>UD       g5<4,4,1>UD                     { align1 1Q };
>>>         ERROR: When the destination spans two registers, the source must span two registers
>>>                (exceptions for scalar source and packed-word to packed-dword expansion)
>>> mov(8)          g8.1<2>UD       g5.4<4,4,1>UD                   { align1 2N };
>>>         ERROR: The offset from the two source registers must be the same
>>>
>>> The intention was to emit mov(4)s for the instructions that have ERROR
>>> annotations.
>>>
>>> See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
>>> for example.
>>>
>>> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> index b570792badd..f6034bc8b76 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> @@ -2025,6 +2025,7 @@ generate_code(struct brw_codegen *p,
>>>           assert(type_sz(dst.type) == 8);
>>>  
>>>           brw_set_default_access_mode(p, BRW_ALIGN_1);
>>> +         brw_set_default_exec_size(p, BRW_EXECUTE_4);
>>>  
>> 
>> NAK, we're missing a bug elsewhere if the exec_size coming in from the
>> IR is not accurate.  You don't happen to be doubling the execution size
>> of this single-precision instruction, do you?
>> 
>
> The exec_size from the IR is correct and Matt's patch too.
>
> The problem is that the original instruction
> (VEC4_OPCODE_SET_{HIGH,LOW}_32BIT) is a double-precision one because dst
> is DF, so we doubled the exec_size at the beginning of this function
> -generate_code()-.
>
> However, this opcode reads 32-bits elements from source and write them
> into the respective 32-bit position (low or high) of the dst, so we
> retyped the destination to UD to do it (see below), hence this
> instruction is actually a 32-bit one.
>
> We can either use Matt patch or add an exception to the doubling
> exec_size condition which is at the beginning of this function. In
> either case, I will add a comment explaining this, something like:
>
> /* Don't double exec_size for VEC4_OPCODE_SET_{HIGH,LOW}_32BIT
>  * because dst is retyped to a 32-bit type to copy source's values,
>  * so the instruction is effectively a 32-bit one.
>  */
>
> Which solution do you prefer?
>

Hardcoding an execution size in the generator is not a real solution if
you're exposing execution sizes to the IR.  I think the ultimate problem
here is that the execution type you deduce for this instruction while
doubling the exec size doesn't match the real execution type of the
instruction, because you pretend that it has a DF destination at the IR
level which is kind of a lie.

This doesn't seem to be the only case where execution doubling may be a
problem BTW, you're likely to run into a similar issue with at least
VEC4_OPCODE_PICK_LOW_32BIT and VEC4_OPCODE_PICK_LOW_32BIT -- There may
be more.

I think you could either fix the exec_size doubling logic to make it
aware of the real execution type of these single-precision instructions
that have fake DF types, use fake Q types at the IR level, or do
something more along the lines of this patch, but if you do the latter
you shouldn't be ignoring the incoming exec_size, and there are other
instructions you may have to fix case by case.

> Sam
>
>>>           dst = retype(dst, BRW_REGISTER_TYPE_UD);
>>>           if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT)
>>> @@ -2037,6 +2038,7 @@ generate_code(struct brw_codegen *p,
>>>           src[0].hstride = BRW_HORIZONTAL_STRIDE_1;
>>>           brw_MOV(p, dst, src[0]);
>>>  
>>> +         brw_set_default_exec_size(p, BRW_EXECUTE_8);
>>>           brw_set_default_access_mode(p, BRW_ALIGN_16);
>>>           break;
>>>        }
>>> -- 
>>> 2.11.0
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170307/012e26fb/attachment.sig>