[Mesa-dev] [PATCH 33/95] i965/vec4: implement d2b

Wed Aug 17 22:15:02 UTC 2016

Iago Toral <itoral at igalia.com> writes:

> On Tue, 2016-08-02 at 18:27 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga <itoral at igalia.com> writes:
>> 
>> > 
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++++++++++++++++++
>> >  1 file changed, 18 insertions(+)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > index 1525a3d..4014020 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > @@ -1497,6 +1497,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr
>> > *instr)
>> >        emit(CMP(dst, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ));
>> >        break;
>> >  
>> > +   case nir_op_d2b: {
>> > +      /* two-argument instructions can't take 64-bit immediates */
>> > +      dst_reg zero = dst_reg(this, glsl_type::dvec4_type);
>> > +      emit(MOV(zero, brw_imm_df(0.0)));
>> > +
>> > +      dst_reg tmp = dst_reg(this, glsl_type::dvec4_type);
>> > +      emit(CMP(tmp, op[0], src_reg(zero), BRW_CONDITIONAL_NZ));
>> > +
>> > +      /* Convert the double CMP result to a single boolean result.
>> > For that
>> > +       * we take the low 32-bit chunk of each DF component in the
>> > result.
>> > +       * and do a final MOV to honor the original writemask
>> > +       */
>> > +      dst_reg result = dst_reg(this, glsl_type::bvec4_type);
>> > +      emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(tmp));
>> > +      emit(MOV(dst, src_reg(result)));
>> Couldn't you just do a single CMP instruction of the double-precision
>> argument and a single-precision 0.0 immediate? 
>
> It never occurred to me that we could do that but it seems to work just
> fine.
>
>>  I think you could
>> potentially also use a 32bit destination type on the CMP instruction
>> so
>> you don't need to emit the PICK_LOW+MOV instructions afterwards.
>
> This does not seem to work though.
>
>>   It may
>> hit an instruction decompression bug but it could be cleaned up by
>> the
>> SIMD lowering pass afterwards, AFAICT the result would be two
>> uncompressed instructions instead of the two uncompressed plus two
>> compressed instructions above.
>
> I tried splitting the instruction just in case. Notice that doing this
> requires that we improve the splitting pass to support splitting
> instructions that write only one register (the original series did not
> implement support for that because so far the only cared to split DF
> instructions that wrote more than one register). I implemented a couple
> of quick hacks to get this done so I could test this but the result is
> still incorrect.
>
> For reference, if we don't split, the resulting CMP looks like this
> (after full scalarization):
>
> cmp.nz.f0(8)  g8<1>.xD  g6<2,2,1>.xyxyDF 0F     { align16 1Q };
> cmp.nz.f0(8)  g8<1>.yD  g6<2,2,1>.zwzwDF 0F     { align16 1Q };
> cmp.nz.f0(8)  g8<1>.zD  g6.2<0,2,1>.xyxyDF 0F   { align16 1Q };
> cmp.nz.f0(8)  g8<1>.wD  g6.2<0,2,1>.zwzwDF 0F   { align16 1Q };
>
> And if we split the instruction in two we produce this:
>
> cmp.nz.f0(4)  g6<1>.xD     g1<0,2,1>.xyxyDF 0F    { align16 1N };
> cmp.nz.f0(4)  g6<1>.yD     g1<0,2,1>.zwzwDF 0F    { align16 1N };
> cmp.nz.f0(4)  g6<1>.zD     g1.2<0,2,1>.xyxyDF 0F  { align16 1N };
> cmp.nz.f0(4)  g6<1>.wD     g1.2<0,2,1>.zwzwDF 0F  { align16 1N };
> cmp.nz.f0(4)  g6.4<1>.xD   g1<0,2,1>.xyxyDF 0F    { align16 2N };
> cmp.nz.f0(4)  g6.4<1>.yD   g1<0,2,1>.zwzwDF 0F    { align16 2N };
> cmp.nz.f0(4)  g6.4<1>.zD   g1.2<0,2,1>.xyxyDF 0F  { align16 2N };
> cmp.nz.f0(4)  g6.4<1>.wD   g1.2<0,2,1>.zwzwDF 0F  { align16 2N };
>
> Both sequences of instructions look correct to me as far as the
> splitting and DF regioning go, so I guess the 32-bit CMP result is not
> doing what we want.
>
> I am not too surprised about this. Even if we use a 32b dst type I
> suppose the instruction execution type is still 64b so I guess there
> should be a 64b to 32b conversion involved in writing to a 32b result.
> Seeing how 64b to 32b conversions require that we emit specific code
> using ALIGN1 mode to deal with alignment requirements I guess it is not
> too surprising that this does not work, right?
>

Yeah, that's not too surprising, don't worry about it.  Another idea
would be to do something along the lines of 'MOV.nz null, <df source
goes here>' to check whether the source is non-zero (so you avoid
loading the immediate which is a non-trivial operation on Gen7), and
then use a single-precision SEL instruction (or two predicated MOV
instructions) to write true or false to the destination (so you don't
need the PICK_LOW+MOV sequence).

> I can send you a trace if you want to have a look at it.
>
> Iago
>
>> > 
>> > +      break;
>> > +   }
>> > +
>> >     case nir_op_i2b:
>> >        emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ));
>> >        break;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160817/cff7a1de/attachment.sig>