[Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations
Ian Romanick
idr at freedesktop.org
Thu Feb 28 01:04:46 UTC 2019
On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> Now that we propagate constants to the first source of 2src instructions we
> see more opportunities of constant folding in the backend.
>
> Shader-db results on KBL:
>
> total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> helped: 14292
> HURT: 59
>
> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> helped: 14056
> HURT: 3314
>
> total loops in shared programs: 4283 -> 4283 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
>
> total spills in shared programs: 27812 -> 24350 (-12.45%)
> spills in affected programs: 24921 -> 21459 (-13.89%)
> helped: 345
> HURT: 19
>
> total fills in shared programs: 24173 -> 22032 (-8.86%)
> fills in affected programs: 21124 -> 18983 (-10.14%)
> helped: 355
> HURT: 25
Ignore my previous questions about nir_opt_constant_folding after
nir_opt_algebraic_late. I had done that because I added a bunch of
things to nir_opt_algebraic_late that created my constant folding
opportunities.
This is the combined changes for this patch and the previous patch. For
this patch alone, I got:
total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
instructions in affected programs: 2911451 -> 2826756 (-2.91%)
helped: 13121
HURT: 44
helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
HURT stats (abs) min: 3 max: 453 x̄: 61.16 x̃: 5
HURT stats (rel) min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
95% mean confidence interval for instructions value: -6.61 -6.26
95% mean confidence interval for instructions %-change: -4.23% -4.07%
Instructions are helped.
total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
cycles in affected programs: 146769299 -> 144179283 (-1.76%)
helped: 10992
HURT: 1833
helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
HURT stats (abs) min: 1 max: 1718 x̄: 87.93 x̃: 42
HURT stats (rel) min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
95% mean confidence interval for cycles value: -248.21 -155.69
95% mean confidence interval for cycles %-change: -1.67% -1.44%
Cycles are helped.
total spills in shared programs: 28828 -> 28888 (0.21%)
spills in affected programs: 2037 -> 2097 (2.95%)
helped: 0
HURT: 24
total fills in shared programs: 35542 -> 35639 (0.27%)
fills in affected programs: 3078 -> 3175 (3.15%)
helped: 2
HURT: 26
I decided to look at some of the hurt shaders... it looks like some of
the Unigine geometry shaders really took a beating (+150% instructions).
Note the "max" in the "instructions in affected programs" above.
More comments below by SHL...
> LOST: 0
> GAINED: 5
> ---
> src/intel/compiler/brw_fs.cpp | 203 ++++++++++++++++++++++++++++++++--
> 1 file changed, 195 insertions(+), 8 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2358acbeb59..b2b60237c82 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
> break;
>
> case BRW_OPCODE_MUL:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
> continue;
>
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> + assert(inst->src[0].type == inst->src[1].type);
> + bool local_progress = true;
> + switch (inst->src[0].type) {
> + case BRW_REGISTER_TYPE_HF: {
> + float v1 = _mesa_half_to_float(inst->src[0].ud & 0xffffu);
> + float v2 = _mesa_half_to_float(inst->src[1].ud & 0xffffu);
> + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
> + break;
> + }
> + case BRW_REGISTER_TYPE_W: {
> + int16_t v1 = inst->src[0].ud & 0xffffu;
> + int16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_w(v1 * v2);
> + break;
> + }
> + case BRW_REGISTER_TYPE_UW: {
> + uint16_t v1 = inst->src[0].ud & 0xffffu;
> + uint16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_uw(v1 * v2);
> + break;
> + }
> + case BRW_REGISTER_TYPE_F:
> + inst->src[0].f *= inst->src[1].f;
> + break;
> + case BRW_REGISTER_TYPE_D:
> + inst->src[0].d *= inst->src[1].d;
> + break;
> + case BRW_REGISTER_TYPE_UD:
> + inst->src[0].ud *= inst->src[1].ud;
> + break;
> + default:
> + local_progress = false;
> + break;
> + };
> +
> + if (local_progress) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> + }
> +
> +
> /* a * 1.0 = a */
> if (inst->src[1].is_one()) {
> inst->opcode = BRW_OPCODE_MOV;
> @@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic()
> break;
> }
>
> + if (inst->src[0].is_one()) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[0] = inst->src[1];
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> +
> /* a * -1.0 = -a */
> if (inst->src[1].is_negative_one()) {
> inst->opcode = BRW_OPCODE_MOV;
> @@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic()
> break;
> }
>
> - if (inst->src[0].file == IMM) {
> - assert(inst->src[0].type == BRW_REGISTER_TYPE_F);
> + if (inst->src[0].is_negative_one()) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[0] = inst->src[1];
> + inst->src[0].negate = !inst->src[1].negate;
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> +
> + /* a * 0 = 0 (this is not exact for floating point) */
> + if (inst->src[1].is_zero() &&
> + brw_reg_type_is_integer(inst->src[1].type)) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[0] = inst->src[1];
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> +
> + if (inst->src[0].is_zero() &&
> + brw_reg_type_is_integer(inst->src[0].type)) {
> inst->opcode = BRW_OPCODE_MOV;
> - inst->src[0].f *= inst->src[1].f;
> inst->src[1] = reg_undef;
> progress = true;
> break;
> }
> break;
> case BRW_OPCODE_ADD:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
> continue;
>
> - if (inst->src[0].file == IMM) {
> - assert(inst->src[0].type == BRW_REGISTER_TYPE_F);
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> + assert(inst->src[0].type == inst->src[1].type);
> + bool local_progress = true;
> + switch (inst->src[0].type) {
> + case BRW_REGISTER_TYPE_HF: {
> + float v1 = _mesa_half_to_float(inst->src[0].ud & 0xffffu);
> + float v2 = _mesa_half_to_float(inst->src[1].ud & 0xffffu);
> + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 + v2));
> + break;
> + }
> + case BRW_REGISTER_TYPE_W: {
> + int16_t v1 = inst->src[0].ud & 0xffffu;
> + int16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_w(v1 + v2);
> + break;
> + }
> + case BRW_REGISTER_TYPE_UW: {
> + uint16_t v1 = inst->src[0].ud & 0xffffu;
> + uint16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_uw(v1 + v2);
> + break;
> + }
> + case BRW_REGISTER_TYPE_F:
> + inst->src[0].f += inst->src[1].f;
> + break;
> + case BRW_REGISTER_TYPE_D:
> + inst->src[0].d += inst->src[1].d;
> + break;
> + case BRW_REGISTER_TYPE_UD:
> + inst->src[0].ud += inst->src[1].ud;
> + break;
> + default:
> + local_progress = false;
> + break;
> + };
> +
> + if (local_progress) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> + }
> +
> + /* a + 0 = a (this is not exact for floating point) */
> + if (inst->src[1].is_zero() &&
> + brw_reg_type_is_integer(inst->src[1].type)) {
> inst->opcode = BRW_OPCODE_MOV;
> - inst->src[0].f += inst->src[1].f;
> inst->src[1] = reg_undef;
> progress = true;
> break;
> }
> +
> + if (inst->src[0].is_zero() &&
> + brw_reg_type_is_integer(inst->src[0].type)) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[0] = inst->src[1];
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> + break;
> + case BRW_OPCODE_SHL:
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> + bool local_progress = true;
> + switch (inst->src[0].type) {
> + case BRW_REGISTER_TYPE_D:
> + case BRW_REGISTER_TYPE_UD:
> + inst->src[0].ud <<= inst->src[1].ud;
> + break;
There's something fishy going on here. In one of the less hurt Unigine
shaders, the before code was:
mov(1) g2<1>UD 0x00000001UD { align1 WE_all 1N compacted };
mov(8) g126<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
shl(8) g127<1>UD g2<0,1,0>UD 0xffffffffUD { align1 1Q compacted };
and the after code is
mov(8) g126<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
mov(8) g127<1>UD 0x00000008UD { align1 WE_all 1Q compacted };
That doesn't seem right. I thought 1U << 0xffffffffU should be 1u << 31
= 0x80000000. Since Gen explicitly only uses the low 5 bits of src1,
maybe we need "& 0x1f" around all the shift counts? I thought the CPU
did the same thing...
Does this series pass the CI?
> + case BRW_REGISTER_TYPE_W:
> + case BRW_REGISTER_TYPE_UW: {
> + uint16_t v1 = inst->src[0].ud & 0xffffu;
> + uint16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = retype(brw_imm_uw(v1 << v2), inst->src[0].type);
> + break;
> + }
> + default:
> + local_progress = false;
> + break;
> + }
> + if (local_progress) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> + }
> + break;
> + case BRW_OPCODE_SHR:
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> + bool local_progress = true;
> + switch (inst->src[0].type) {
> + case BRW_REGISTER_TYPE_D:
> + inst->src[0].d >>= inst->src[1].ud;
> + break;
> + case BRW_REGISTER_TYPE_UD:
> + inst->src[0].ud >>= inst->src[1].ud;
> + break;
> + case BRW_REGISTER_TYPE_W: {
> + int16_t v1 = inst->src[0].ud & 0xffffu;
> + uint16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_w(v1 >> v2);
> + break;
> + }
> + case BRW_REGISTER_TYPE_UW: {
> + uint16_t v1 = inst->src[0].ud & 0xffffu;
> + uint16_t v2 = inst->src[1].ud & 0xffffu;
> + inst->src[0] = brw_imm_uw(v1 >> v2);
> + break;
> + }
> + default:
> + local_progress = false;
> + break;
> + }
> + if (local_progress) {
> + inst->opcode = BRW_OPCODE_MOV;
> + inst->src[1] = reg_undef;
> + progress = true;
> + break;
> + }
> + }
> break;
> case BRW_OPCODE_OR:
> if (inst->src[0].equals(inst->src[1]) ||
>
More information about the mesa-dev
mailing list