[Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction

Jose Fonseca jfonseca at vmware.com
Thu Apr 4 08:23:22 PDT 2013



----- Original Message -----
> On 04.04.2013 17:01, Jose Fonseca wrote:
> >
> > ----- Original Message -----
> >>> On 04.04.2013 03:45, Zack Rusin wrote:
> >>>> It's part of SM4 (http://goo.gl/4IpeK). It's also fairly
> >>>> painful to emulate without branching. Most hardware
> >>>> supports it natively and even llvm has a 'select' opcode
> >>>> which can handle it without too much hassle.
> >>>>
> >>>> diff --git a/src/gallium/docs/source/tgsi.rst
> >>>> b/src/gallium/docs/source/tgsi.rst
> >>>> index 28308cb..6c5a02b 100644
> >>>> --- a/src/gallium/docs/source/tgsi.rst
> >>>> +++ b/src/gallium/docs/source/tgsi.rst
> >>>> @@ -72,6 +72,17 @@ used.
> >>>>  
> >>>>    dst.w = src.w
> >>>>  
> >>>> +.. opcode:: MOVC - Conditional move
> >>>> +
> >>>> +.. math::
> >>>> +
> >>>> +  dst.x = src0.x ? src1.x : src2.x
> >>>> +
> >>>> +  dst.y = src0.y ? src1.y : src2.y
> >>>> +
> >>>> +  dst.z = src0.z ? src1.z : src2.z
> >>>> +
> >>>> +  dst.w = src0.w ? src1.w : src2.w
> >>>>  
> >>> I think we already have that:
> >>>
> >>> .. opcode:: UCMP - Integer Conditional Move
> >>>
> >>> .. math::
> >>>
> >>>   dst.x = src0.x ? src1.x : src2.x
> >>>
> >>>   dst.y = src0.y ? src1.y : src2.y
> >>>
> >>>   dst.z = src0.z ? src1.z : src2.z
> >>>
> >>>   dst.w = src0.w ? src1.w : src2.w
> >>>
> >>>
> >>> No difference apart from the source ordering (the "integer" just implies
> >>> that any non-zero value counts as true, i.e. also inf, nan and -0).
> >> That's really broken. UCMP needs to be a an unsigned version of the CMP
> >> instruction which does
> >> dst.chan = (src0.chan < 0) ? src1.chan : src2.chan
> >> not a whole new instruction. It's what everyone implements anyway. So if
> >> st_glsl_to_tgsi needs
> >> a conditional move we need to add the above patch and change it to use it.
> > Yes, it doesn't seem that any of the TGSI_OPCODE_UCMP implementation does
> > that the spec says it supposedly does -- it seems everybody implements it
> > as an unsigned version of CMP. That is, it seems UCMP's description needs
> > to be fixed.
> 
> Erm, unsigned < 0 doesn't make sense.

Ah indeed!

> Definitely what the description says:
> static void
> micro_ucmp(union tgsi_exec_channel *dst,
>            const union tgsi_exec_channel *src0,
>            const union tgsi_exec_channel *src1,
>            const union tgsi_exec_channel *src2)
> {
>    dst->u[0] = src0->u[0] ? src1->u[0] : src2->u[0];
>    dst->u[1] = src0->u[1] ? src1->u[1] : src2->u[1];
>    dst->u[2] = src0->u[2] ? src1->u[2] : src2->u[2];
>    dst->u[3] = src0->u[3] ? src1->u[3] : src2->u[3];
> }
> 
> or
> 
>    case TGSI_OPCODE_UCMP:
>    case TGSI_OPCODE_CMP:
>       FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
>          src0 = fetchSrc(0, c);
>          src1 = fetchSrc(1, c);
>          src2 = fetchSrc(2, c);
>          if (src1 == src2)
>             mkMov(dst0[c], src1);
>          else
>             mkCmp(OP_SLCT, (srcTy == TYPE_F32) ? CC_LT(less than 0) :
> CC_NE(not equal 0),
>                   srcTy, dst0[c], src1, src2, src0);
>       }
> 

But odd enough, the implementations I happend to look at seemed to do "foo >= 0":

src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c has:

static void emit_ucmp(
                const struct lp_build_tgsi_action * action,
                struct lp_build_tgsi_context * bld_base,
                struct lp_build_emit_data * emit_data)
{
        LLVMBuilderRef builder = bld_base->base.gallivm->builder;

        LLVMValueRef v = LLVMBuildFCmp(builder, LLVMRealUGE,
                        emit_data->args[0], lp_build_const_float(bld_base->base.gallivm, 0.), "");

        emit_data->output[emit_data->chan] = LLVMBuildSelect(builder, v, emit_data->args[2], emit_data->args[1], "");
}

(it doesn't even seem to do integers at all)

src/gallium/drivers/r600/r600_shader.c:

static int tgsi_ucmp(struct r600_shader_ctx *ctx)
{
	struct tgsi_full_instruction *inst = &ctx->parse.FullToken.FullInstruction;
	struct r600_bytecode_alu alu;
	int i, r;
	int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);

	for (i = 0; i < lasti + 1; i++) {
		if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
			continue;

		memset(&alu, 0, sizeof(struct r600_bytecode_alu));
		alu.op = ALU_OP3_CNDGE_INT;
		r600_bytecode_src(&alu.src[0], &ctx->src[0], i);
		r600_bytecode_src(&alu.src[1], &ctx->src[2], i);
		r600_bytecode_src(&alu.src[2], &ctx->src[1], i);
		tgsi_dst(ctx, &inst->Dst[0], i, &alu.dst);
		alu.dst.chan = i;
		alu.dst.write = 1;
		alu.is_op3 = 1;
		if (i == lasti)
			alu.last = 1;
		r = r600_bytecode_add_alu(ctx->bc, &alu);
		if (r)
			return r;
	}
	return 0;
}



More information about the mesa-dev mailing list