[Mesa-dev] [PATCH 3/3] gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader.
Eric Anholt
eric at anholt.net
Sun Mar 29 23:11:05 PDT 2015
Kenneth Graunke <kenneth at whitecape.org> writes:
> On Friday, March 27, 2015 01:54:32 PM Eric Anholt wrote:
>> This will be used by the VC4 driver for doing device-independent
>> optimization, and hopefully eventually replacing its whole IR. It also
>> may be useful to other drivers for the same reason.
> Hi Eric!
>
> I have a bunch of comments below, but overall this looks great.
>
> You should probably have someone who knows TGSI better than I do review
> it, but for what it's worth, this is:
>
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
Thanks! There was definitely useful feedback in here, and I've taken
most of it.
>> +/* LOG - Approximate Logarithm Base 2
>> + * dst.x = \lfloor\log_2{|src.x|}\rfloor
>> + * dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
>> + * dst.z = \log_2{|src.x|}
>> + * dst.w = 1.0
>> + */
>> +static void
>> +ttn_log(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
>> +{
>> + nir_ssa_def *abs_srcx = nir_fabs(b, ttn_channel(b, src[0], X));
>> + nir_ssa_def *log2 = nir_flog2(b, abs_srcx);
>> +
>> + ttn_move_dest_masked(b, dest, nir_ffloor(b, log2), TGSI_WRITEMASK_X);
>> + ttn_move_dest_masked(b, dest,
>> + nir_fdiv(b, abs_srcx, nir_fexp2(b, nir_ffloor(b,
> log2))),
>
> You're generating two copies of floor(log2) here, which will have to be
> CSE'd later. In prog_to_nir, I created a temporary and used it in both
> places:
>
> nir_ssa_def *floor_log2 = nir_ffloor(b, log2);
>
> We're generating tons of rubbish for NIR to optimize anyway, so it's not
> a big deal...but...may as well do the trivial improvement.
I much more expect the whole mess except for the dst.z computation to
get DCEed away, so it's just one extra DCE out of so many.
(and we generate lots of copy prop to avoid all throughout this mess,
which I've considered short-circuting in nir_builder some day).
>> +static void
>> +ttn_sle(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
>> +{
>> + ttn_move_dest(b, dest, nir_sge(b, src[1], src[0]));
>> +}
>
> I've got code here to generate b2f(fge(...)) instead of sge(...) since I
> didn't want to bother implementing it in my driver, and figured the b2fs
> might be able to get optimized away.
>
> That said, I suppose we could probably just add lowering transformations
> that turn sge -> b2f(fge(...)) when options->native_integers is set, and
> delete my code...
For me an SGE in hardware is:
fsub.sf(null, src0, src1)
mov.nc(dest, 1.0)
mov.ns(dest, 0)
while your plan would be... oh wait. I didn't even have a b2f
implementation because TGSI doesn't do that (they just AND the bool with
1.0). But an FGE is:
fsub.sf(null, src0, src1)
mov.nc(dest, ~0)
mov.ns(dest, 0)
so any more instructions would be worse.
>> +static void
>> +ttn_xpd(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
>> +{
>> + ttn_move_dest_masked(b, dest,
>> + nir_fsub(b,
>> + nir_fmul(b,
>> + ttn_swizzle(b, src[0], Y, Z, X,
> X),
>> + ttn_swizzle(b, src[1], Z, X, Y,
> X)),
>> + nir_fmul(b,
>> + ttn_swizzle(b, src[1], Y, Z, X,
> X),
>> + ttn_swizzle(b, src[0], Z, X, Y,
> X))),
>> + TGSI_WRITEMASK_XYZ);
>> + ttn_move_dest_masked(b, dest, nir_imm_float(b, 1.0), TGSI_WRITEMASK_W);
>> +}
>> +
>> +static void
>> +ttn_dp2a(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
>> +{
>> + ttn_move_dest(b, dest,
>> + ttn_channel(b, nir_fadd(b,
>> + ttn_channel(b, nir_fdot2(b,
> src[0],
>> + src[1]),
>> + X),
>
> Do you really need to do ttn_channel(b, ..., X) on a fdot2 result? It's
> already a scalar value. Same comment applies to the below four.
>
> I should probably delete that from prog_to_nir as well.
Good catch, the cleanups for scalar in the builder have obsoleted this,
I think.
>
>> + src[2]),
>> + X));
>> +}
>> +static void
>> +ttn_ucmp(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
>> +{
>> + ttn_move_dest(b, dest, nir_bcsel(b,
>> + nir_ine(b, src[0], nir_imm_float(b,
> 0.0)),
>
> Doing nir_imm_int(b, 0) here would make more sense.
Yeah, now that I have it :)
>> +static void
>> +ttn_kill_if(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def
> **src)
>> +{
>> + nir_if *if_stmt = nir_if_create(b->shader);
>> + if_stmt->condition =
>> + nir_src_for_ssa(nir_bany4(b, nir_flt(b, src[0], nir_imm_float(b,
> 0.0))));
>> + nir_cf_node_insert_end(b->cf_node_list, &if_stmt->cf_node);
>> +
>> + nir_intrinsic_instr *discard =
>> + nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard);
>> + nir_instr_insert_after_cf_list(&if_stmt->then_list, &discard->instr);
>
> A while back I created a discard_if intrinsic for exactly this case.
> You just want:
>
> nir_ssa_def *cmp = nir_bany4(b, nir_flt(b, src[0], nir_imm_float(b, 0.0)));
> nir_intrinsic_instr *discard =
> nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
> discard->src[0] = nir_src_for_ssa(cmp);
> nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
Oh, yeah, I forgot to make use of it.
And in the process I noticed that I was doing an extra compare in NIR
(one to make the if condition, and one to turn the if condition into a
select between 0 and 1.0, even though the discard value I track is just
"discard the pixel if nonzero"). Benefit of NIR goes from .5% to .6% in
instruction count.
Oh, and looking at this there's another obvious stupid pattern to
recover from TGSI, which gets us another .08%.
>> +}
>> +
>> +static void
>> +ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
>> +{
>> + nir_builder *b = &c->build;
>> +
>> + /* Save the outside-of-the-if-statement node list. */
>> + c->if_stack[c->if_stack_pos] = b->cf_node_list;
>> + c->if_stack_pos++;
>> +
>> + src = ttn_channel(b, src, X);
>> +
>> + nir_if *if_stmt = nir_if_create(b->shader);
>> + if (is_uint) {
>> + if_stmt->condition = nir_src_for_ssa(nir_ine(b, src, nir_imm_int(b,
> 0)));
>> + } else {
>> + if_stmt->condition = nir_src_for_ssa(nir_fne(b, src, nir_imm_int(b,
> 0)));
>> + }
>> + nir_cf_node_insert_end(b->cf_node_list, &if_stmt->cf_node);
>> +
>> + nir_builder_insert_after_cf_list(b, &if_stmt->then_list);
>> +
>> + c->if_stack[c->if_stack_pos] = &if_stmt->else_list;
>> + c->if_stack_pos++;
>> +}
>> +
>> +static void
>> +ttn_else(struct ttn_compile *c)
>> +{
>> + nir_builder *b = &c->build;
>> +
>> + nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos - 1]);
>> +}
>> +
>> +static void
>> +ttn_endif(struct ttn_compile *c)
>> +{
>> + nir_builder *b = &c->build;
>> +
>> + c->if_stack_pos -= 2;
>
> Why 2? I don't see ELSE or ENDIF being put on the stack, so aren't you
> just trying to pop off the single IF?
The IF stack has two entries per IF -- the outside-of-the-statement cf
list to save, and the else block's cf list.
>> + case TGSI_TEXTURE_CUBE:
>> + instr->sampler_dim = GLSL_SAMPLER_DIM_CUBE;
>> + break;
>> + case TGSI_TEXTURE_CUBE_ARRAY:
>> + instr->sampler_dim = GLSL_SAMPLER_DIM_CUBE;
>> + instr->is_array = true;
>> + break;
>
> FWIW, I don't think CUBE_ARRAY and SHADOWCUBE_ARRAY will work with
> this pass...I'm pretty sure you need to implement TXB2/TXL2. You call
> this function for those, but they fall through to "unknown TGSI tex op".
>
> I suspect you don't care for your hardware, so punting and making
> someone else do it seems reasonable. :)
Yeah, I tried to fill in obvious bits, but getting those to work seemed
unlikely just by typing on my own.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150329/3835dce6/attachment-0001.sig>
More information about the mesa-dev
mailing list