[Mesa-dev] [PATCH v3 20/24] i965/vec4: Fix exec size for MOVs SET_{HIGH, LOW}_32BIT.

Wed Mar 8 11:51:47 UTC 2017

On 07/03/17 22:38, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> On 04/03/17 01:26, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>>
>>>> From: Matt Turner <mattst88 at gmail.com>
>>>>
>>>> Otherwise for a pack_double_2x32_split opcode, we emit:
>>>>
>>>>    vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
>>>> mov(8)          g5<1>UD         g5<4>.xUD                       { align16 1Q compacted };
>>>> mov(8)          g7<2>UD         g5<4,4,1>UD                     { align1 1Q };
>>>>         ERROR: When the destination spans two registers, the source must span two registers
>>>>                (exceptions for scalar source and packed-word to packed-dword expansion)
>>>> mov(8)          g8<2>UD         g5.4<4,4,1>UD                   { align1 2N };
>>>>         ERROR: The offset from the two source registers must be the same
>>>> mov(8)          g5<1>UD         g6<4>.xUD                       { align16 1Q compacted };
>>>> mov(8)          g7.1<2>UD       g5<4,4,1>UD                     { align1 1Q };
>>>>         ERROR: When the destination spans two registers, the source must span two registers
>>>>                (exceptions for scalar source and packed-word to packed-dword expansion)
>>>> mov(8)          g8.1<2>UD       g5.4<4,4,1>UD                   { align1 2N };
>>>>         ERROR: The offset from the two source registers must be the same
>>>>
>>>> The intention was to emit mov(4)s for the instructions that have ERROR
>>>> annotations.
>>>>
>>>> See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
>>>> for example.
>>>>
>>>> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>>>> ---
>>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>>> index b570792badd..f6034bc8b76 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>>> @@ -2025,6 +2025,7 @@ generate_code(struct brw_codegen *p,
>>>>           assert(type_sz(dst.type) == 8);
>>>>  
>>>>           brw_set_default_access_mode(p, BRW_ALIGN_1);
>>>> +         brw_set_default_exec_size(p, BRW_EXECUTE_4);
>>>>  
>>>
>>> NAK, we're missing a bug elsewhere if the exec_size coming in from the
>>> IR is not accurate.  You don't happen to be doubling the execution size
>>> of this single-precision instruction, do you?
>>>
>>
>> The exec_size from the IR is correct and Matt's patch too.
>>
>> The problem is that the original instruction
>> (VEC4_OPCODE_SET_{HIGH,LOW}_32BIT) is a double-precision one because dst
>> is DF, so we doubled the exec_size at the beginning of this function
>> -generate_code()-.
>>
>> However, this opcode reads 32-bits elements from source and write them
>> into the respective 32-bit position (low or high) of the dst, so we
>> retyped the destination to UD to do it (see below), hence this
>> instruction is actually a 32-bit one.
>>
>> We can either use Matt patch or add an exception to the doubling
>> exec_size condition which is at the beginning of this function. In
>> either case, I will add a comment explaining this, something like:
>>
>> /* Don't double exec_size for VEC4_OPCODE_SET_{HIGH,LOW}_32BIT
>>  * because dst is retyped to a 32-bit type to copy source's values,
>>  * so the instruction is effectively a 32-bit one.
>>  */
>>
>> Which solution do you prefer?
>>
> 
> Hardcoding an execution size in the generator is not a real solution if
> you're exposing execution sizes to the IR.  I think the ultimate problem
> here is that the execution type you deduce for this instruction while
> doubling the exec size doesn't match the real execution type of the
> instruction, because you pretend that it has a DF destination at the IR
> level which is kind of a lie.
> 
> This doesn't seem to be the only case where execution doubling may be a
> problem BTW, you're likely to run into a similar issue with at least
> VEC4_OPCODE_PICK_LOW_32BIT and VEC4_OPCODE_PICK_LOW_32BIT -- There may
> be more.
> 
> I think you could either fix the exec_size doubling logic to make it
> aware of the real execution type of these single-precision instructions
> that have fake DF types, use fake Q types at the IR level, or do
> something more along the lines of this patch, but if you do the latter
> you shouldn't be ignoring the incoming exec_size, and there are other
> instructions you may have to fix case by case.
> 

I'm going to do the fix in the exec_size doubling logic.

Thanks,

Sam

>> Sam
>>
>>>>           dst = retype(dst, BRW_REGISTER_TYPE_UD);
>>>>           if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT)
>>>> @@ -2037,6 +2038,7 @@ generate_code(struct brw_codegen *p,
>>>>           src[0].hstride = BRW_HORIZONTAL_STRIDE_1;
>>>>           brw_MOV(p, dst, src[0]);
>>>>  
>>>> +         brw_set_default_exec_size(p, BRW_EXECUTE_8);
>>>>           brw_set_default_access_mode(p, BRW_ALIGN_16);
>>>>           break;
>>>>        }
>>>> -- 
>>>> 2.11.0
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170308/f059fe79/attachment-0001.sig>