[Mesa-dev] [PATCH 3/3] gallivm: add integer and unsigned mod arit functions.

Mon Feb 27 12:26:39 PST 2012

----- Original Message -----
> On Mon, Feb 20, 2012 at 01:50:43PM -0800, Jose Fonseca wrote:
> > 
> > 
> > ----- Original Message -----
> > > 
> > > 
> > > ----- Original Message -----
> > > > On Sat, Feb 18, 2012 at 4:20 AM, Jose Fonseca
> > > > <jfonseca at vmware.com>
> > > > wrote:
> > > > > ----- Original Message -----
> > > > >> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca
> > > > >> <jfonseca at vmware.com>
> > > > >> wrote:
> > > > >> > Dave,
> > > > >> >
> > > > >> > Ideally there should be only one lp_build_mod() which will
> > > > >> > invoke
> > > > >> > LLVMBuildSRem or LLVMBuildURem depending on the value of
> > > > >> > bld->type.sign.  The point being that this allows the same
> > > > >> > code
> > > > >> > generation logic to seemingly target any type without
> > > > >> > having
> > > > >> > to
> > > > >> > worry too much which target it is targeting.
> > > > >>
> > > > >> Yeah I agree with this for now, but I'm starting to think a
> > > > >> lot
> > > > >> of
> > > > >> this stuff is redunant once I looked at what Tom has done.
> > > > >>
> > > > >> The thing is TGSI doesn't have that many crazy options where
> > > > >> you
> > > > >> are
> > > > >> going to be targetting instructions at the wrong type, and
> > > > >> wrapping
> > > > >> all the basic llvm interfaces with an extra type layer seems
> > > > >> to
> > > > >> me
> > > > >> long term like a waste of time.
> > > > >
> > > > > So far llvmpipe's TGSI->LLVM IR has only been targetting
> > > > > floating
> > > > > point SIMD instructions.
> > > > >
> > > > > But truth is that many simple fragment shaders can be
> > > > > partially
> > > > > done with 8bit and 16bit SIMD integers, if values are
> > > > > represented
> > > > > in 8bit unorm and 16 bit unorms. The throughput for these
> > > > > will be
> > > > > much higher, as not only we can squeeze more elements, they
> > > > > take
> > > > > less cycles, and the hardware has several arithmetic units.
> > > > >
> > > > > The point of those lp_build_xxx functions is to handle this
> > > > > transparently. See, e.g., how lp_build_mul handles fixed
> > > > > point.
> > > > > Currently this is only used for blending, but the hope is to
> > > > > eventually use it on TGSI translation of simple fragment
> > > > > shaders.
> > > > >
> > > > > Maybe not the case for the desktop GPUs, but I also heard
> > > > > that
> > > > > some
> > > > > low powered devices have shader engines w/ 8bit unorms.
> > > > >
> > > > > But of course, not all opcodes can be done correctly: and
> > > > > URem/SRem
> > > > > might not be one we care.
> > > > >
> > > > >> I'm happy for now to finish the integer support in the same
> > > > >> style
> > > > >> as
> > > > >> the current code, but I think moving forward afterwards it
> > > > >> might
> > > > >> be
> > > > >> worth investigating a more direct instruction emission
> > > > >> scheme.
> > > > >
> > > > > If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from
> > > > > tgsi
> > > > > translation I'm fine with it. We can always generalize
> > > > >
> > > > >> Perhaps
> > > > >> Tom can comment also from his experience.
> > > > >
> > > > > BTW, Tom, I just now noticed that there are two action
> > > > > versions
> > > > > for
> > > > > add:
> > > > >
> > > > > /* TGSI_OPCODE_ADD (CPU Only) */
> > > > > static void
> > > > > add_emit_cpu(
> > > > >   const struct lp_build_tgsi_action * action,
> > > > >   struct lp_build_tgsi_context * bld_base,
> > > > >   struct lp_build_emit_data * emit_data)
> > > > > {
> > > > >   emit_data->output[emit_data->chan] =
> > > > >   lp_build_add(&bld_base->base,
> > > > >                                   emit_data->args[0],
> > > > >                                   emit_data->args[1]);
> > > > > }
> > > > >
> > > > > /* TGSI_OPCODE_ADD */
> > > > > static void
> > > > > add_emit(
> > > > >   const struct lp_build_tgsi_action * action,
> > > > >   struct lp_build_tgsi_context * bld_base,
> > > > >   struct lp_build_emit_data * emit_data)
> > > > > {
> > > > >   emit_data->output[emit_data->chan] = LLVMBuildFAdd(
> > > > >                                bld_base->base.gallivm->builder,
> > > > >                                emit_data->args[0],
> > > > >                                emit_data->args[1], "");
> > > > > }
> > > > >
> > > > > Why is this necessary? lp_build_add will already call
> > > > > LLVMBuildFAdd
> > > > > internally as appropriate.
> > > > >
> > > > > Is this because some of the functions in lp_bld_arit.c will
> > > > > emit
> > > > > x86 intrinsics? If so then a "no-x86-intrinsic" flag in the
> > > > > build
> > > > > context would achieve the same effect with less code
> > > > > duplication.
> > > > >
> > > > > If possible I'd prefer a single version of these actions. If
> > > > > not,
> > > > > then I'd prefer have them split: lp_build_action_cpu.c and
> > > > > lp_build_action_gpu.
> > > > 
> > > > Yes, this is why a split them up.  I can add that flag and
> > > > merge
> > > > the
> > > > actions together.
> > > 
> > > That would be nice. Thanks.
> > 
> > Tom, actually I've been looking more at the code, thinking about
> > this, and I'm not so sure what's best anymore.
> > 
> > I'd appreciate your honest answer: do you think the stuff in
> > lp_bld_arit.[ch] of any use for GPUs in general (or AMD's in
> > particular), or is it just an hinderance?
> > 
> > As I said before, for CPUs, this abstraction is useful, to allow to
> > convert TGSI (and other fixed function state) -> fixed point SIMD
> > instructions, which yield the highest throughput on CPUs. Because
> > LLVM native types are not expressive enough for fixed function,
> > etc.
> > 
> > But if this is useless for GPUs (i.e, if LLVM's native types are
> > sufficient), then we can make this abstraction a CPU only thing.
> > 
> 
> I don't think the lp_bld_arit.c functions are really useful for GPUs,
> and I don't rely on any of them in the R600 backend.  Also, I was
> looking
> through those functions again and the problem is more than just x86
> intrinsics.  Some of them assume vector types, which I don't use at
> all.

Does that mean that the R600 generates/consumes only scalar expressions?

> So, maybe it is best to keep them separate.

Yes, it seems so.

Does anybody else working or planning in writting TGSI -> LLVM translation pass has other thoughts?

If not, I'll eventually split the helpers in two kinds:
- generic helpers that operate directly with LLVMTypeRef and LLVMValueRef
- native SIMD abstraction, that operates with lp_build_type, for fixed point, etc.

For the record, this is what you seem to use so far:

$ git grep '#include.*gallivm' src/gallium/drivers/r600/
src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_const.h"
src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_intr.h"
src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_gather.h"
src/gallium/drivers/r600/r600_llvm.h:#include "gallivm/lp_bld_tgsi.h"

Jose