[Mesa-dev] [PATCH v3 29/43] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

Tue Oct 24 11:32:59 UTC 2017

El 15/10/17 a las 12:59, Pohjolainen, Topi escribió:
> On Thu, Oct 12, 2017 at 08:38:18PM +0200, Jose Maria Casanova Crespo wrote:
>> The VS load input for 16-bit values receives pairs of 16-bit values
>> packed in 32-bit values. Because of the adjusted format used at:
>>
>>  anv/pipeline: Use 32-bit surface formats for 16-bit formats
>>
>> v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
>> ---
>>  src/intel/compiler/brw_fs_nir.cpp | 21 +++++++++++++++++++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp
>> index d2f2e17b70..83ff0607a7 100644
>> --- a/src/intel/compiler/brw_fs_nir.cpp
>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>> @@ -2322,8 +2322,25 @@ fs_visitor::nir_emit_vs_intrinsic(const fs_builder &bld,
>>        assert(const_offset && "Indirect input loads not allowed");
>>        src = offset(src, bld, const_offset->u32[0]);
>>  
>> -      for (unsigned j = 0; j < num_components; j++) {
>> -         bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
>> +      /* The VS load input for 16-bit values receives pairs of 16-bit values
>> +       * packed in 32-bit values. This is an example on SIMD8:
>> +       *
>> +       * xy xy xy xy xy xy xy xy
>> +       * zw zw zw zw yw zw zw xw
>                         zw
Fixed locally.
>
>> +       *
>> +       * We need to format it to something like:
>> +       *
>> +       * xx xx xx xx yy yy yy yy
>> +       * zz zz zz zz ww ww ww ww
>> +       */
>> +      if (type_sz(type) == 2) {
>> +         for (unsigned j = 0; j < num_components; j++)
>> +            bld.MOV(offset(dest, bld, j),
>> +                    subscript(retype(offset(src,bld, (j / 2) * 2 + first_component),
> Space missing before 'bld'. I went thru the math and it looks correct to me.
Fixed locally. About the math, it is based in the same principles that
the one at shuffle_32bit_load_result_to_16bit_data function available at
the following patch of the series, but it is simpler as in this case
source and destination are different that we can not assume for the
general case.
>
>> +                                     BRW_REGISTER_TYPE_F), type, j % 2));
>> +      } else {
>> +         for (unsigned j = 0; j < num_components; j++)
>> +            bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
>>        }
>>  
>>        if (type == BRW_REGISTER_TYPE_DF) {
>> -- 
>> 2.13.6
>>
>> _______________________________________________
>