[Mesa-dev] [PATCH 3/3] gallivm: add integer and unsigned mod arit functions.

Sat Feb 18 01:20:02 PST 2012

----- Original Message -----
> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca <jfonseca at vmware.com>
> wrote:
> > Dave,
> >
> > Ideally there should be only one lp_build_mod() which will invoke
> > LLVMBuildSRem or LLVMBuildURem depending on the value of
> > bld->type.sign.  The point being that this allows the same code
> > generation logic to seemingly target any type without having to
> > worry too much which target it is targeting.
> 
> Yeah I agree with this for now, but I'm starting to think a lot of
> this stuff is redunant once I looked at what Tom has done.
> 
> The thing is TGSI doesn't have that many crazy options where you are
> going to be targetting instructions at the wrong type, and wrapping
> all the basic llvm interfaces with an extra type layer seems to me
> long term like a waste of time.

So far llvmpipe's TGSI->LLVM IR has only been targetting floating point SIMD instructions.

But truth is that many simple fragment shaders can be partially done with 8bit and 16bit SIMD integers, if values are represented in 8bit unorm and 16 bit unorms. The throughput for these will be much higher, as not only we can squeeze more elements, they take less cycles, and the hardware has several arithmetic units.

The point of those lp_build_xxx functions is to handle this transparently. See, e.g., how lp_build_mul handles fixed point. Currently this is only used for blending, but the hope is to eventually use it on TGSI translation of simple fragment shaders.

Maybe not the case for the desktop GPUs, but I also heard that some low powered devices have shader engines w/ 8bit unorms.

But of course, not all opcodes can be done correctly: and URem/SRem might not be one we care.

> I'm happy for now to finish the integer support in the same style as
> the current code, but I think moving forward afterwards it might be
> worth investigating a more direct instruction emission scheme.

If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from tgsi translation I'm fine with it. We can always generalize 

> Perhaps
> Tom can comment also from his experience.

BTW, Tom, I just now noticed that there are two action versions for add:

/* TGSI_OPCODE_ADD (CPU Only) */
static void
add_emit_cpu(
   const struct lp_build_tgsi_action * action,
   struct lp_build_tgsi_context * bld_base,
   struct lp_build_emit_data * emit_data)
{
   emit_data->output[emit_data->chan] = lp_build_add(&bld_base->base,
                                   emit_data->args[0], emit_data->args[1]);
}

/* TGSI_OPCODE_ADD */
static void
add_emit(
   const struct lp_build_tgsi_action * action,
   struct lp_build_tgsi_context * bld_base,
   struct lp_build_emit_data * emit_data)
{
   emit_data->output[emit_data->chan] = LLVMBuildFAdd(
                                bld_base->base.gallivm->builder,
                                emit_data->args[0], emit_data->args[1], "");
}

Why is this necessary? lp_build_add will already call LLVMBuildFAdd internally as appropriate.

Is this because some of the functions in lp_bld_arit.c will emit x86 intrinsics? If so then a "no-x86-intrinsic" flag in the build context would achieve the same effect with less code duplication.

If possible I'd prefer a single version of these actions. If not, then I'd prefer have them split: lp_build_action_cpu.c and lp_build_action_gpu.

Jose