[Mesa-dev] [PATCH v2 11/11] nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Eduardo Lima Mitev
elima at igalia.com
Tue Sep 15 12:02:53 PDT 2015
Patch 10.1, 10.2 and this one are:
Reviewed-by: Eduardo Lima Mitev <elima at igalia.com>
In the case of patch 10.1 and 10.2, though they look good to me, I would
let Connor Abbott give his OK, since he is in CC for those, and also my
experience with NIR constant expressions is rather limited.
Eduardo
On 09/11/2015 05:53 PM, Jason Ekstrand wrote:
> Now that we have a replicating fdot instruction, we can actually coalesce
> into the destinations of vec4 instructions. We couldn't really do this
> before because, if the destination had to end up in .z, we couldn't
> reswizzle the instruction. With a replicated destination, the result ends
> up in all channels so we can just set the writemask and we're done.
>
> Shader-db results for vec4 programs on Haswell:
>
> total instructions in shared programs: 1778849 -> 1751223 (-1.55%)
> instructions in affected programs: 763104 -> 735478 (-3.62%)
> helped: 7067
> HURT: 26
>
> It turns out that dot-products matter...
>
> Cc: Eduardo Lima Mitev <elima at igalia.com>
> ---
> src/glsl/nir/nir_lower_vec_to_movs.c | 49 ++++++++++++++++++++++++++----------
> 1 file changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c
> index 9ff86ea..2cb0457 100644
> --- a/src/glsl/nir/nir_lower_vec_to_movs.c
> +++ b/src/glsl/nir/nir_lower_vec_to_movs.c
> @@ -79,6 +79,14 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
> return mov->dest.write_mask;
> }
>
> +static bool
> +has_replicated_dest(nir_alu_instr *alu)
> +{
> + return alu->op == nir_op_fdot_replicated2 ||
> + alu->op == nir_op_fdot_replicated3 ||
> + alu->op == nir_op_fdot_replicated4;
> +}
> +
> /* Attempts to coalesce the "move" from the given source of the vec to the
> * destination of the instruction generating the value. If, for whatever
> * reason, we cannot coalesce the mmove, it does nothing and returns 0. We
> @@ -116,19 +124,28 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
> nir_alu_instr *src_alu =
> nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr);
>
> - /* We only care about being able to re-swizzle the instruction if it is
> - * something that we can reswizzle. It must be per-component.
> - */
> - if (nir_op_infos[src_alu->op].output_size != 0)
> - return 0;
> -
> - /* If we are going to reswizzle the instruction, we can't have any
> - * non-per-component sources either.
> - */
> - for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
> - if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
> + if (has_replicated_dest(src_alu)) {
> + /* The fdot instruction is special: It replicates its result to all
> + * components. This means that we can always rewrite its destination
> + * and we don't need to swizzle anything.
> + */
> + } else {
> + /* We only care about being able to re-swizzle the instruction if it is
> + * something that we can reswizzle. It must be per-component. The one
> + * exception to this is the fdotN instructions which implicitly splat
> + * their result out to all channels.
> + */
> + if (nir_op_infos[src_alu->op].output_size != 0)
> return 0;
>
> + /* If we are going to reswizzle the instruction, we can't have any
> + * non-per-component sources either.
> + */
> + for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
> + if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
> + return 0;
> + }
> +
> /* Stash off all of the ALU instruction's swizzles. */
> uint8_t swizzles[4][4];
> for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
> @@ -148,8 +165,14 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
> * instruction so we can re-swizzle that component to match.
> */
> write_mask |= 1 << i;
> - for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
> - src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
> + if (has_replicated_dest(src_alu)) {
> + /* Since the destination is a single replicated value, we don't need
> + * to do any reswizzling
> + */
> + } else {
> + for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
> + src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
> + }
>
> /* Clear the no longer needed vec source */
> nir_instr_rewrite_src(&vec->instr, &vec->src[i].src, NIR_SRC_INIT);
>
More information about the mesa-dev
mailing list