[Mesa-dev] [PATCH 2/2] i965/fs: indirect addressing with doubles is not supported in CHV/BSW

Fri Jun 17 07:14:08 UTC 2016

On 17/06/16 08:57, Kenneth Graunke wrote:
> On Wednesday, June 15, 2016 9:25:45 AM PDT Samuel Iglesias Gonsálvez wrote:
>> From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
>> Restrictions, page 844:
>>
>>   "When source or destination datatype is 64b or operation is integer DWord
>>    multiply, indirect addressing must not be used."
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> Cc: "12.0" <mesa-stable at lists.freedesktop.org>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 +++++++++++++++++++++++++++++---
>>  1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index c271e64..be162ff 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -3602,6 +3602,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>>        } else {
>>           fs_reg indirect = retype(get_nir_src(instr->src[0]),
>>                                    BRW_REGISTER_TYPE_UD);
>> +         int num_components = instr->num_components;
>>  
>>           /* We need to pass a size to the MOV_INDIRECT but we don't want it to
>>            * go past the end of the uniform.  In order to keep the n'th
>> @@ -3609,14 +3610,51 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>>            * one component of the vector.
>>            */
>>           assert(instr->const_index[1] >=
>> -                instr->num_components * (int) type_sz(dest.type));
>> +                num_components * (int) type_sz(dest.type));
>> +
>>           unsigned read_size = instr->const_index[1] -
>> -            (instr->num_components - 1) * type_sz(dest.type);
>> +            (num_components - 1) * type_sz(dest.type);
>> +
>> +         fs_reg temp = dest;
>> +         fs_reg indirect_chv_high_32bit;
>> +         bool is_cherryview_64bit =
>> +            devinfo->is_cherryview && type_sz(dest.type) == 8;
>> +         if (is_cherryview_64bit) {
>> +            /* Duplicate the number of components because we will read
>> +             * each 64-bit component with two 32-bit reads.
>> +             */
>> +            num_components *= 2;
>> +            /* Temporary destination to save the data. We will do a shuffle
>> +             * later.
>> +             */
>> +            temp = bld.vgrf(BRW_REGISTER_TYPE_UD, num_components);
> 
> I think if you just used inst->num_components * 2 here...
> 
>> +            indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
>> +            /* Calculate indirect address to read high 32 bits */
>> +            bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4));
>> +         }
>>  
>> -         for (unsigned j = 0; j < instr->num_components; j++) {
>> +         int i, j;
>> +         for (j = 0, i = 0; i < num_components; j++, i++) {
> 
> and then left it as j < instr->num_components rather than i...
> then you wouldn't have to make a num_components temporary nor
> multiply it by 2.  Might be a little simpler?
> 

Yes, it is simpler. I am going to do it.

>>              bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> -                     offset(dest, bld, j), offset(src, bld, j),
>> +                     offset(temp, bld, i), offset(src, bld, j),
>>                       indirect, brw_imm_ud(read_size));
>> +
>> +            if (is_cherryview_64bit) {
>> +               /* Read higher 32 bits, increase 'i' to save the
>> +                * data in the right destination register's offset.
>> +                */
>> +               i++;
>> +               bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> +                        offset(temp, bld, i), offset(src, bld, j),
>> +                        indirect_chv_high_32bit, brw_imm_ud(read_size));
>> +            }
>> +         }
>> +
>> +         if (is_cherryview_64bit) {
>> +            shuffle_32bit_load_result_to_64bit_data(bld,
>> +                                                    dest,
>> +                                                    temp,
>> +                                                    instr->num_components);
> 
> I suspect you could do this without a shuffle by subscripting the
> destinations directly...plus, you have byte-offsets per channel on the
> source data, so you can definitely have it come in however you like
> (unlike URB reads).
> 

Right. I am going to test this solution. If it works, I will send a
patch implementing it.

> Still, this should work as is, and fixes a problem, so:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> I think it could probably be done in a simpler fashion, though.
> 

OK, thanks.

Sam

>>           }
>>        }
>>        break;
>>
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160617/95b16e9b/attachment-0001.sig>