[Mesa-dev] [PATCH 3/3] gallivm: add integer and unsigned mod arit functions.

Mon Feb 20 07:57:05 PST 2012

----- Original Message -----
> On Sat, Feb 18, 2012 at 4:20 AM, Jose Fonseca <jfonseca at vmware.com>
> wrote:
> > ----- Original Message -----
> >> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca
> >> <jfonseca at vmware.com>
> >> wrote:
> >> > Dave,
> >> >
> >> > Ideally there should be only one lp_build_mod() which will
> >> > invoke
> >> > LLVMBuildSRem or LLVMBuildURem depending on the value of
> >> > bld->type.sign.  The point being that this allows the same code
> >> > generation logic to seemingly target any type without having to
> >> > worry too much which target it is targeting.
> >>
> >> Yeah I agree with this for now, but I'm starting to think a lot of
> >> this stuff is redunant once I looked at what Tom has done.
> >>
> >> The thing is TGSI doesn't have that many crazy options where you
> >> are
> >> going to be targetting instructions at the wrong type, and
> >> wrapping
> >> all the basic llvm interfaces with an extra type layer seems to me
> >> long term like a waste of time.
> >
> > So far llvmpipe's TGSI->LLVM IR has only been targetting floating
> > point SIMD instructions.
> >
> > But truth is that many simple fragment shaders can be partially
> > done with 8bit and 16bit SIMD integers, if values are represented
> > in 8bit unorm and 16 bit unorms. The throughput for these will be
> > much higher, as not only we can squeeze more elements, they take
> > less cycles, and the hardware has several arithmetic units.
> >
> > The point of those lp_build_xxx functions is to handle this
> > transparently. See, e.g., how lp_build_mul handles fixed point.
> > Currently this is only used for blending, but the hope is to
> > eventually use it on TGSI translation of simple fragment shaders.
> >
> > Maybe not the case for the desktop GPUs, but I also heard that some
> > low powered devices have shader engines w/ 8bit unorms.
> >
> > But of course, not all opcodes can be done correctly: and URem/SRem
> > might not be one we care.
> >
> >> I'm happy for now to finish the integer support in the same style
> >> as
> >> the current code, but I think moving forward afterwards it might
> >> be
> >> worth investigating a more direct instruction emission scheme.
> >
> > If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from tgsi
> > translation I'm fine with it. We can always generalize
> >
> >> Perhaps
> >> Tom can comment also from his experience.
> >
> > BTW, Tom, I just now noticed that there are two action versions for
> > add:
> >
> > /* TGSI_OPCODE_ADD (CPU Only) */
> > static void
> > add_emit_cpu(
> >   const struct lp_build_tgsi_action * action,
> >   struct lp_build_tgsi_context * bld_base,
> >   struct lp_build_emit_data * emit_data)
> > {
> >   emit_data->output[emit_data->chan] =
> >   lp_build_add(&bld_base->base,
> >                                   emit_data->args[0],
> >                                   emit_data->args[1]);
> > }
> >
> > /* TGSI_OPCODE_ADD */
> > static void
> > add_emit(
> >   const struct lp_build_tgsi_action * action,
> >   struct lp_build_tgsi_context * bld_base,
> >   struct lp_build_emit_data * emit_data)
> > {
> >   emit_data->output[emit_data->chan] = LLVMBuildFAdd(
> >                                bld_base->base.gallivm->builder,
> >                                emit_data->args[0],
> >                                emit_data->args[1], "");
> > }
> >
> > Why is this necessary? lp_build_add will already call LLVMBuildFAdd
> > internally as appropriate.
> >
> > Is this because some of the functions in lp_bld_arit.c will emit
> > x86 intrinsics? If so then a "no-x86-intrinsic" flag in the build
> > context would achieve the same effect with less code duplication.
> >
> > If possible I'd prefer a single version of these actions. If not,
> > then I'd prefer have them split: lp_build_action_cpu.c and
> > lp_build_action_gpu.
> 
> Yes, this is why a split them up.  I can add that flag and merge the
> actions together.

That would be nice. Thanks.

Jose