[Mesa-dev] [PATCH v3 16/24] i965/vec4: fix SIMD-width lowering for VEC4_OPCODE_FROM_DOUBLE in IVB/BYT

Samuel Iglesias Gonsálvez siglesias at igalia.com
Tue Mar 7 13:09:43 UTC 2017



On 04/03/17 01:04, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>>
>> When splitting VEC4_OPCODE_FROM_DOUBLE in Ivybridge/Baytrail, the second
>> part should use a temporal register, and then move the values to the
>> second half of the original destination, so we get all the results in the
>> same register.
>>
>> v2:
>> - Fix typos (Matt).
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp           | 17 +++++++++++++----
>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  1 +
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 64b435f3ec4..adcde085305 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -2191,9 +2191,15 @@ vec4_visitor::lower_simd_width()
>>           linst->group = channel_offset;
>>           linst->size_written = size_written;
>>  
>> +         /* When splitting VEC4_OPCODE_FROM_DOUBLE on Ivybridge, the second part
>> +          * should use in a temporal register. Later we will move the values
>> +          * to the second half of the original destination, so we get all the
>> +          * results in the same register. We use d2f_pass to detect this case.
>> +          */
>> +         bool d2f_pass = (inst->opcode == VEC4_OPCODE_FROM_DOUBLE && n > 0);
>>           /* Compute split dst region */
>>           dst_reg dst;
>> -         if (needs_temp) {
>> +         if (needs_temp || d2f_pass) {
>>              unsigned num_regs = DIV_ROUND_UP(size_written, REG_SIZE);
>>              dst = retype(dst_reg(VGRF, alloc.allocate(num_regs)),
>>                           inst->dst.type);
>> @@ -2226,9 +2232,12 @@ vec4_visitor::lower_simd_width()
>>           /* If we used a temporary to store the result of the split
>>            * instruction, copy the result to the original destination
>>            */
>> -         if (needs_temp) {
>> -            vec4_instruction *mov =
>> -               MOV(offset(inst->dst, lowered_width, n), src_reg(dst));
>> +         if (needs_temp || d2f_pass) {
>> +            vec4_instruction *mov;
>> +            if (d2f_pass)
>> +               mov = MOV(horiz_offset(inst->dst, n * type_sz(inst->dst.type)), src_reg(dst));
> 
> I have no idea how this could possibly work...  horiz_offset() expects a
> number of scalar components, not bytes.  Anyway I have a hunch this is
> trying to workaround the bug I pointed out in PATCH 15...
> 

This worked by luck. We want an horizontal offset of 'lowered_width'
components per split. As inst->dst.type is F, type_sz(inst->dst.type) =
4 and we set the maximum 'lowered_width' value to 4, it matches in the
cases we tested... however, this is a bug.

Actually it works too if we use offset() instead, so this hunk is not
needed.

Sam

>> +            else
>> +               mov = MOV(offset(inst->dst, lowered_width, n), src_reg(dst));
>>              mov->exec_size = lowered_width;
>>              mov->group = channel_offset;
>>              mov->size_written = size_written;
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>> index 7fa1afc9073..b570792badd 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>> @@ -1532,6 +1532,7 @@ generate_code(struct brw_codegen *p,
>>               is_ivb_df);
>>  
>>        assert(inst->group % 8 == 0 ||
>> +             (inst->exec_size == 4 && inst->group % 4 == 0) ||
>>               inst->dst.type == BRW_REGISTER_TYPE_DF ||
>>               inst->src[0].type == BRW_REGISTER_TYPE_DF ||
>>               inst->src[1].type == BRW_REGISTER_TYPE_DF ||
>> -- 
>> 2.11.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev





-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170307/ea8f0141/attachment.sig>


More information about the mesa-dev mailing list