[Mesa-dev] [PATCH 12/15] i965: Add support for ir_triop_cond_sel.
Matt Turner
mattst88 at gmail.com
Thu Aug 29 11:52:36 PDT 2013
On Fri, Aug 23, 2013 at 9:24 AM, Paul Berry <stereotype441 at gmail.com> wrote:
> On 22 August 2013 16:08, Matt Turner <mattst88 at gmail.com> wrote:
>>
>> ---
>> src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 +
>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 6 ++++++
>> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 6 ++++++
>> 3 files changed, 13 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
>> index 6ee6d01..34dbc90 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
>> @@ -362,6 +362,7 @@
>> ir_channel_expressions_visitor::visit_leave(ir_assignment *ir)
>>
>> case ir_triop_fma:
>> case ir_triop_lrp:
>> + case ir_triop_cond_sel:
>> case ir_triop_bitfield_extract:
>> for (i = 0; i < vector_elements; i++) {
>> ir_rvalue *op0 = get_element(op_var[0], i);
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> index 4b54bee..27887d6 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> @@ -735,6 +735,12 @@ fs_visitor::visit(ir_expression *ir)
>> case ir_triop_lrp:
>> emit_lrp(this->result, op[0], op[1], op[2]);
>> break;
>> +
>> + case ir_triop_cond_sel:
>> + emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_NZ));
>> + inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]);
>> + inst->predicate = BRW_PREDICATE_NORMAL;
>> + break;
>> }
>> }
>
>
> For the uses of ir_triop_cond_sel we have currently (the lowering passes in
> patch 14), I believe this will generate efficient code. But if we adapt the
> mix() functions to use it, then there are probably going to be a lot of uses
> like this:
>
> x = mix(y, z, a < b);
>
> Which will compile down to this silly assembly (please excuse the
> pseudocode--I can't remember the assembly syntax exactly):
>
> CMP.lt tmp a b
> CMP.nz null tmp 0
> SEL.f0 x y z
>
> What if we modify the loop that calls ir->operands[operand]->accept(this) on
> each operand (near the top of the visitor) so that it skips operand 0 when
> the expression is ir_triop_cond_sel. Then, in the switch statement, we can
> do something like this:
>
> emit_bool_to_cond_code(ir->operands[0]);
>
> inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]);
> inst->predicate = BRW_PREDICATE_NORMAL;
>
> That should produce the assembly we want, which is:
>
> CMP.lt null a b
> SEL.fo x y z
I agree. I tried to do this and kept running into problems that would
be solved by considering the flags registers better in various
optimization passes.
Using the code from fs-mix-bvec4-infnan.shader_test for example
(gl_FragColor = mix(arg0, arg1, selector);). We generate something
like the following with some local changes:
cmp.e.f0(8) null g2.4<0,1,0>D 0D
(+f0) sel(8) m1<1>F g2<0,1,0>F g4<8,8,1>F
cmp.e.f0(8) null g2.5<0,1,0>D 0D
(+f0) sel(8) m2<1>F g2.1<0,1,0>F g7<8,8,1>F
cmp.e.f0(8) null g2.6<0,1,0>D 0D
(+f0) sel(8) m3<1>F g4<8,8,1>F g2<0,1,0>F
cmp.e.f0(8) null g2.7<0,1,0>D 0D
(+f0) sel(8) m4<1>F g7<8,8,1>F g2.1<0,1,0>F
which would benefit from using the multiple flag registers available
on IVB and newer (or even the subregisters, not sure how hardware
dependency tracking works for them).
And from the code generated for frexp is stupid:
cmp.ne.f0(8) g5<1>D (abs)g2<0,1,0>F 0F
...
cmp.e.f0(8) null g5<8,8,1>D 0D
(-f0) sel(8) g9<1>D g8<8,8,1>D -126D
...
cmp.e.f0(8) null g5<8,8,1>D 0D
(-f0) sel(8) g12<1>UD g11<8,8,1>UD 0x3f000000UD
where we could just evaluate the abs(x) != 0.0f and use that flag
register for the two sels.
Basically, I think what I'm saying is yes, the generated code is
silly, but that I think our backend really needs to understand about
flags register to make it significantly better.
More information about the mesa-dev
mailing list