[Mesa-dev] [PATCH 3/3] gallivm: add integer and unsigned mod arit functions.

Mon Feb 20 13:50:43 PST 2012

----- Original Message -----
> 
> 
> ----- Original Message -----
> > On Sat, Feb 18, 2012 at 4:20 AM, Jose Fonseca <jfonseca at vmware.com>
> > wrote:
> > > ----- Original Message -----
> > >> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca
> > >> <jfonseca at vmware.com>
> > >> wrote:
> > >> > Dave,
> > >> >
> > >> > Ideally there should be only one lp_build_mod() which will
> > >> > invoke
> > >> > LLVMBuildSRem or LLVMBuildURem depending on the value of
> > >> > bld->type.sign.  The point being that this allows the same
> > >> > code
> > >> > generation logic to seemingly target any type without having
> > >> > to
> > >> > worry too much which target it is targeting.
> > >>
> > >> Yeah I agree with this for now, but I'm starting to think a lot
> > >> of
> > >> this stuff is redunant once I looked at what Tom has done.
> > >>
> > >> The thing is TGSI doesn't have that many crazy options where you
> > >> are
> > >> going to be targetting instructions at the wrong type, and
> > >> wrapping
> > >> all the basic llvm interfaces with an extra type layer seems to
> > >> me
> > >> long term like a waste of time.
> > >
> > > So far llvmpipe's TGSI->LLVM IR has only been targetting floating
> > > point SIMD instructions.
> > >
> > > But truth is that many simple fragment shaders can be partially
> > > done with 8bit and 16bit SIMD integers, if values are represented
> > > in 8bit unorm and 16 bit unorms. The throughput for these will be
> > > much higher, as not only we can squeeze more elements, they take
> > > less cycles, and the hardware has several arithmetic units.
> > >
> > > The point of those lp_build_xxx functions is to handle this
> > > transparently. See, e.g., how lp_build_mul handles fixed point.
> > > Currently this is only used for blending, but the hope is to
> > > eventually use it on TGSI translation of simple fragment shaders.
> > >
> > > Maybe not the case for the desktop GPUs, but I also heard that
> > > some
> > > low powered devices have shader engines w/ 8bit unorms.
> > >
> > > But of course, not all opcodes can be done correctly: and
> > > URem/SRem
> > > might not be one we care.
> > >
> > >> I'm happy for now to finish the integer support in the same
> > >> style
> > >> as
> > >> the current code, but I think moving forward afterwards it might
> > >> be
> > >> worth investigating a more direct instruction emission scheme.
> > >
> > > If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from
> > > tgsi
> > > translation I'm fine with it. We can always generalize
> > >
> > >> Perhaps
> > >> Tom can comment also from his experience.
> > >
> > > BTW, Tom, I just now noticed that there are two action versions
> > > for
> > > add:
> > >
> > > /* TGSI_OPCODE_ADD (CPU Only) */
> > > static void
> > > add_emit_cpu(
> > >   const struct lp_build_tgsi_action * action,
> > >   struct lp_build_tgsi_context * bld_base,
> > >   struct lp_build_emit_data * emit_data)
> > > {
> > >   emit_data->output[emit_data->chan] =
> > >   lp_build_add(&bld_base->base,
> > >                                   emit_data->args[0],
> > >                                   emit_data->args[1]);
> > > }
> > >
> > > /* TGSI_OPCODE_ADD */
> > > static void
> > > add_emit(
> > >   const struct lp_build_tgsi_action * action,
> > >   struct lp_build_tgsi_context * bld_base,
> > >   struct lp_build_emit_data * emit_data)
> > > {
> > >   emit_data->output[emit_data->chan] = LLVMBuildFAdd(
> > >                                bld_base->base.gallivm->builder,
> > >                                emit_data->args[0],
> > >                                emit_data->args[1], "");
> > > }
> > >
> > > Why is this necessary? lp_build_add will already call
> > > LLVMBuildFAdd
> > > internally as appropriate.
> > >
> > > Is this because some of the functions in lp_bld_arit.c will emit
> > > x86 intrinsics? If so then a "no-x86-intrinsic" flag in the build
> > > context would achieve the same effect with less code duplication.
> > >
> > > If possible I'd prefer a single version of these actions. If not,
> > > then I'd prefer have them split: lp_build_action_cpu.c and
> > > lp_build_action_gpu.
> > 
> > Yes, this is why a split them up.  I can add that flag and merge
> > the
> > actions together.
> 
> That would be nice. Thanks.

Tom, actually I've been looking more at the code, thinking about this, and I'm not so sure what's best anymore.

I'd appreciate your honest answer: do you think the stuff in lp_bld_arit.[ch] of any use for GPUs in general (or AMD's in particular), or is it just an hinderance?

As I said before, for CPUs, this abstraction is useful, to allow to convert TGSI (and other fixed function state) -> fixed point SIMD instructions, which yield the highest throughput on CPUs. Because LLVM native types are not expressive enough for fixed function, etc.

But if this is useless for GPUs (i.e, if LLVM's native types are sufficient), then we can make this abstraction a CPU only thing.

Jose