[Mesa-dev] [PATCH 33/95] i965/vec4: implement d2b

Thu Aug 18 19:08:44 UTC 2016

Iago Toral <itoral at igalia.com> writes:

> On Wed, 2016-08-17 at 15:15 -0700, Francisco Jerez wrote:
>> Iago Toral <itoral at igalia.com> writes:
>> 
>> > 
>> > On Tue, 2016-08-02 at 18:27 -0700, Francisco Jerez wrote:
>> > > 
>> > > Iago Toral Quiroga <itoral at igalia.com> writes:
>> > > 
>> > > > 
>> > > > 
>> > > > ---
>> > > >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18
>> > > > ++++++++++++++++++
>> > > >  1 file changed, 18 insertions(+)
>> > > > 
>> > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > > > index 1525a3d..4014020 100644
>> > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > > > @@ -1497,6 +1497,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr
>> > > > *instr)
>> > > >        emit(CMP(dst, op[0], brw_imm_f(0.0f),
>> > > > BRW_CONDITIONAL_NZ));
>> > > >        break;
>> > > >  
>> > > > +   case nir_op_d2b: {
>> > > > +      /* two-argument instructions can't take 64-bit
>> > > > immediates */
>> > > > +      dst_reg zero = dst_reg(this, glsl_type::dvec4_type);
>> > > > +      emit(MOV(zero, brw_imm_df(0.0)));
>> > > > +
>> > > > +      dst_reg tmp = dst_reg(this, glsl_type::dvec4_type);
>> > > > +      emit(CMP(tmp, op[0], src_reg(zero),
>> > > > BRW_CONDITIONAL_NZ));
>> > > > +
>> > > > +      /* Convert the double CMP result to a single boolean
>> > > > result.
>> > > > For that
>> > > > +       * we take the low 32-bit chunk of each DF component in
>> > > > the
>> > > > result.
>> > > > +       * and do a final MOV to honor the original writemask
>> > > > +       */
>> > > > +      dst_reg result = dst_reg(this, glsl_type::bvec4_type);
>> > > > +      emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(tmp));
>> > > > +      emit(MOV(dst, src_reg(result)));
>> > > Couldn't you just do a single CMP instruction of the double-
>> > > precision
>> > > argument and a single-precision 0.0 immediate? 
>> > It never occurred to me that we could do that but it seems to work
>> > just
>> > fine.
>> > 
>> > > 
>> > >  I think you could
>> > > potentially also use a 32bit destination type on the CMP
>> > > instruction
>> > > so
>> > > you don't need to emit the PICK_LOW+MOV instructions afterwards.
>> > This does not seem to work though.
>> > 
>> > > 
>> > >   It may
>> > > hit an instruction decompression bug but it could be cleaned up
>> > > by
>> > > the
>> > > SIMD lowering pass afterwards, AFAICT the result would be two
>> > > uncompressed instructions instead of the two uncompressed plus
>> > > two
>> > > compressed instructions above.
>> > I tried splitting the instruction just in case. Notice that doing
>> > this
>> > requires that we improve the splitting pass to support splitting
>> > instructions that write only one register (the original series did
>> > not
>> > implement support for that because so far the only cared to split
>> > DF
>> > instructions that wrote more than one register). I implemented a
>> > couple
>> > of quick hacks to get this done so I could test this but the result
>> > is
>> > still incorrect.
>> > 
>> > For reference, if we don't split, the resulting CMP looks like this
>> > (after full scalarization):
>> > 
>> > cmp.nz.f0(8)  g8<1>.xD  g6<2,2,1>.xyxyDF 0F     { align16 1Q };
>> > cmp.nz.f0(8)  g8<1>.yD  g6<2,2,1>.zwzwDF 0F     { align16 1Q };
>> > cmp.nz.f0(8)  g8<1>.zD  g6.2<0,2,1>.xyxyDF 0F   { align16 1Q };
>> > cmp.nz.f0(8)  g8<1>.wD  g6.2<0,2,1>.zwzwDF 0F   { align16 1Q };
>> > 
>> > And if we split the instruction in two we produce this:
>> > 
>> > cmp.nz.f0(4)  g6<1>.xD     g1<0,2,1>.xyxyDF 0F    { align16 1N };
>> > cmp.nz.f0(4)  g6<1>.yD     g1<0,2,1>.zwzwDF 0F    { align16 1N };
>> > cmp.nz.f0(4)  g6<1>.zD     g1.2<0,2,1>.xyxyDF 0F  { align16 1N };
>> > cmp.nz.f0(4)  g6<1>.wD     g1.2<0,2,1>.zwzwDF 0F  { align16 1N };
>> > cmp.nz.f0(4)  g6.4<1>.xD   g1<0,2,1>.xyxyDF 0F    { align16 2N };
>> > cmp.nz.f0(4)  g6.4<1>.yD   g1<0,2,1>.zwzwDF 0F    { align16 2N };
>> > cmp.nz.f0(4)  g6.4<1>.zD   g1.2<0,2,1>.xyxyDF 0F  { align16 2N };
>> > cmp.nz.f0(4)  g6.4<1>.wD   g1.2<0,2,1>.zwzwDF 0F  { align16 2N };
>> > 
>> > Both sequences of instructions look correct to me as far as the
>> > splitting and DF regioning go, so I guess the 32-bit CMP result is
>> > not
>> > doing what we want.
>> > 
>> > I am not too surprised about this. Even if we use a 32b dst type I
>> > suppose the instruction execution type is still 64b so I guess
>> > there
>> > should be a 64b to 32b conversion involved in writing to a 32b
>> > result.
>> > Seeing how 64b to 32b conversions require that we emit specific
>> > code
>> > using ALIGN1 mode to deal with alignment requirements I guess it is
>> > not
>> > too surprising that this does not work, right?
>> > 
>> Yeah, that's not too surprising, don't worry about it.  Another idea
>> would be to do something along the lines of 'MOV.nz null, <df source
>> goes here>' to check whether the source is non-zero (so you avoid
>> loading the immediate which is a non-trivial operation on Gen7), and
>> then use a single-precision SEL instruction (or two predicated MOV
>> instructions) to write true or false to the destination (so you don't
>> need the PICK_LOW+MOV sequence).
>
> Using MOV.nz seems to have a different behavior than CMP.nz. The former
> will consider anything other than 0.0 to be non-zero, whereas the CMP
> version will consider small enough DF values to be zero as well, which
> I think is the behavior we want (I think I remember discussing this
> here when we were implementing this for broadwell).
>
What do you mean by small enough?  Denormalized?  MOV instructions can
flush them to zero as well, only raw moves won't, you can make sure that
the MOV is not raw by applying a source modifier to the source which
shouldn't affect the flag result for normalized sources.

> Fortunately, we can do something like this:
>
>    src_reg one = src_reg(this, glsl_type::ivec4_type);
>    emit(MOV(dst_reg(one), brw_imm_d(~0)));
>
>    vec4_instruction *inst =
>       emit(CMP(dst_null_df(), op[0], brw_imm_f(0.0f),
> BRW_CONDITIONAL_NZ));
>    inst->regs_written = 1;
>
>    inst = emit(BRW_OPCODE_SEL, dst, one, brw_imm_d(0));
>    inst->predicate = BRW_PREDICATE_NORMAL;
>
> which is pretty much the same idea (notice that we use a float constant
> for the CMP) and keeps the original behavior.
>
Hm, I thought you said that using an F immediate on a double-precision
instruction doesn't work?

> If you are curious about the inst->regs_written=1 in the CMP it is
> just because the vec4_constructor will figure out that because it has
> a DF destination it writes two registers, but it really doesn't
> because it is a null dst register.  If we leave it at 2, then the simd
> lowering pass will split the CMP instruction because the 0.0f constant
> argument only reads one register.

That seems like a bug.  There is no need to lower an instruction with a
mismatch in the number of registers read and written if the source is
uniform:

|  When destination spans two registers, the source MUST span two
|  registers. The exception to the above rule:
|
|   - When source is scalar, the source registers are not incremented.
|   - When source is packed integer Word and destination is
|     packed integer DWord, the source register is not incremented
|     but the source sub register is incremented.

The FS back-end implements both exceptions, but there's probably no need
to implement the second one in the VEC4 back-end.

> Maybe we should make vec4_instruction set only 1 register
> written for null dsts or just make the simd lowering pass ignore the
> number of registers written for instructions with a null dst?
>
> Iago
>
>> > 
>> > I can send you a trace if you want to have a look at it.
>> > 
>> > Iago
>> > 
>> > > 
>> > > > 
>> > > > 
>> > > > +      break;
>> > > > +   }
>> > > > +
>> > > >     case nir_op_i2b:
>> > > >        emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ));
>> > > >        break;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160818/06c1065d/attachment.sig>