[Mesa-dev] [PATCH v2 11/11] nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions

Tue Sep 15 12:20:51 PDT 2015

On Tue, Sep 15, 2015 at 12:02 PM, Eduardo Lima Mitev <elima at igalia.com> wrote:
> Patch 10.1, 10.2 and this one are:
>
> Reviewed-by: Eduardo Lima Mitev <elima at igalia.com>
>
> In the case of patch 10.1 and 10.2, though they look good to me, I would
> let Connor Abbott give his OK, since he is in CC for those, and also my
> experience with NIR constant expressions is rather limited.

He already did on IRC if not on e-mail.  Thanks for reviewing!
--Jason

> Eduardo
>
> On 09/11/2015 05:53 PM, Jason Ekstrand wrote:
>> Now that we have a replicating fdot instruction, we can actually coalesce
>> into the destinations of vec4 instructions.  We couldn't really do this
>> before because, if the destination had to end up in .z, we couldn't
>> reswizzle the instruction.  With a replicated destination, the result ends
>> up in all channels so we can just set the writemask and we're done.
>>
>> Shader-db results for vec4 programs on Haswell:
>>
>>    total instructions in shared programs: 1778849 -> 1751223 (-1.55%)
>>    instructions in affected programs:     763104 -> 735478 (-3.62%)
>>    helped:                                7067
>>    HURT:                                  26
>>
>> It turns out that dot-products matter...
>>
>> Cc: Eduardo Lima Mitev <elima at igalia.com>
>> ---
>>  src/glsl/nir/nir_lower_vec_to_movs.c | 49 ++++++++++++++++++++++++++----------
>>  1 file changed, 36 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c
>> index 9ff86ea..2cb0457 100644
>> --- a/src/glsl/nir/nir_lower_vec_to_movs.c
>> +++ b/src/glsl/nir/nir_lower_vec_to_movs.c
>> @@ -79,6 +79,14 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
>>     return mov->dest.write_mask;
>>  }
>>
>> +static bool
>> +has_replicated_dest(nir_alu_instr *alu)
>> +{
>> +   return alu->op == nir_op_fdot_replicated2 ||
>> +          alu->op == nir_op_fdot_replicated3 ||
>> +          alu->op == nir_op_fdot_replicated4;
>> +}
>> +
>>  /* Attempts to coalesce the "move" from the given source of the vec to the
>>   * destination of the instruction generating the value. If, for whatever
>>   * reason, we cannot coalesce the mmove, it does nothing and returns 0.  We
>> @@ -116,19 +124,28 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
>>     nir_alu_instr *src_alu =
>>        nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr);
>>
>> -   /* We only care about being able to re-swizzle the instruction if it is
>> -    * something that we can reswizzle.  It must be per-component.
>> -    */
>> -   if (nir_op_infos[src_alu->op].output_size != 0)
>> -      return 0;
>> -
>> -   /* If we are going to reswizzle the instruction, we can't have any
>> -    * non-per-component sources either.
>> -    */
>> -   for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
>> -      if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
>> +   if (has_replicated_dest(src_alu)) {
>> +      /* The fdot instruction is special: It replicates its result to all
>> +       * components.  This means that we can always rewrite its destination
>> +       * and we don't need to swizzle anything.
>> +       */
>> +   } else {
>> +      /* We only care about being able to re-swizzle the instruction if it is
>> +       * something that we can reswizzle.  It must be per-component.  The one
>> +       * exception to this is the fdotN instructions which implicitly splat
>> +       * their result out to all channels.
>> +       */
>> +      if (nir_op_infos[src_alu->op].output_size != 0)
>>           return 0;
>>
>> +      /* If we are going to reswizzle the instruction, we can't have any
>> +       * non-per-component sources either.
>> +       */
>> +      for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
>> +         if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
>> +            return 0;
>> +   }
>> +
>>     /* Stash off all of the ALU instruction's swizzles. */
>>     uint8_t swizzles[4][4];
>>     for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
>> @@ -148,8 +165,14 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
>>         * instruction so we can re-swizzle that component to match.
>>         */
>>        write_mask |= 1 << i;
>> -      for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
>> -         src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
>> +      if (has_replicated_dest(src_alu)) {
>> +         /* Since the destination is a single replicated value, we don't need
>> +          * to do any reswizzling
>> +          */
>> +      } else {
>> +         for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
>> +            src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
>> +      }
>>
>>        /* Clear the no longer needed vec source */
>>        nir_instr_rewrite_src(&vec->instr, &vec->src[i].src, NIR_SRC_INIT);
>>
>