[Mesa-dev] [PATCH 7/9] radeonsi: add support for easy opcodes from ARB_gpu_shader5

Tue Mar 10 09:00:32 PDT 2015

It seems overkill to use LLVM IR for BFE. There are several ways to
express BFE(value, offset, bits) exactly, for example:

# if bits == 0:
#     0
# else if offset + bits < 32:
#     (value << (32 - offset - bits)) >> (32 - bits)
# else:
#     value >> offset

Which can be simplified to either:

# both = offset + bits
# if bits == 0:
#     0
# else if both < 32:
#     (value << (32 - both)) >> (32 - bits)
# else:
#     value >> offset

Or:

# if bits == 0:
#     0
# else if offset + bits < 32:
#     inv_bits = 32 - bits
#     (value << (inv_bits - offset)) >> inv_bits
# else:
#     value >> offset

All 3 variants should use ASHR if the signed version is required.
Alternatively, one can use this as well:

# if bits == 32:
#     value
# else:
#     (value >> offset) & ((1 << bits) - 1)

And an explicit sign extension should be used for the signed version.

The first three are so complex that I doubt it would be easy to
recognize them and match them precisely. They are also very unlikely
to appear open-coded in real applications, therefore, expanding them
and then matching them again seems like a huge waste of time. Apps
will likely just use LSHR+AND, which extracts a bitfield, but it's not
the same as BFE, so BFE can't be used for that. (if the second operand
of AND is a constant expression, then BFE can indeed be used)

I agree that we could use LLVM IR for simple instructions like BFM and
BFI, but I doubt it's a good idea for complex instructions like BFE,
and any small optimization that changes the expression can break the
matching.

Marek

On Tue, Mar 10, 2015 at 3:30 PM, Tom Stellard <tom at stellard.net> wrote:
> On Tue, Mar 10, 2015 at 12:42:38PM +0100, Marek Olšák wrote:
>> OK. What about patches 8 an 9?
>>
>
> I think the intrinsics in 9 are OK, but 8 should be using LLVM IR.
>
> -Tom
>
>> Marek
>>
>> On Thu, Mar 5, 2015 at 8:30 PM, Tom Stellard <tom at stellard.net> wrote:
>> > On Mon, Mar 02, 2015 at 12:54:21PM +0100, Marek Olšák wrote:
>> >> From: Marek Olšák <marek.olsak at amd.com>
>> >>
>> >
>> > Hi Marek,
>> >
>> > After discussing with Matt, I think we should use LLVM IR rather than
>> > intrinsics for IBFE and UBFE and then add patterns for them either in
>> > the TableGen Files or AMDGPUISelDAGToDAG.cpp.
>> >
>> > Using intrinsics for BREV and POPC is fine though.
>> >
>> > -Tom
>> >
>> >> ---
>> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 8 ++++++++
>> >>  1 file changed, 8 insertions(+)
>> >>
>> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> index 385d3ad..034095f 100644
>> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>> >> @@ -1293,6 +1293,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>> >>       bld_base->op_actions[TGSI_OPCODE_AND].emit = emit_and;
>> >>       bld_base->op_actions[TGSI_OPCODE_ARL].emit = emit_arl;
>> >>       bld_base->op_actions[TGSI_OPCODE_BGNLOOP].emit = bgnloop_emit;
>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].emit = build_tgsi_intrinsic_nomem;
>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = "llvm.AMDGPU.brev";
>> >>       bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit;
>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].emit = build_tgsi_intrinsic_nomem;
>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "ceil";
>> >> @@ -1326,6 +1328,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>> >>       bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp;
>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].emit = build_tgsi_intrinsic_nomem;
>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = "llvm.AMDIL.abs.";
>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].emit = build_tgsi_intrinsic_nomem;
>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = "llvm.AMDGPU.bfe.i32";
>> >>       bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv;
>> >>       bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit;
>> >>       bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit;
>> >> @@ -1350,6 +1354,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>> >>       bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod;
>> >>       bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not;
>> >>       bld_base->op_actions[TGSI_OPCODE_OR].emit = emit_or;
>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].emit = build_tgsi_intrinsic_nomem;
>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].intr_name = "llvm.ctpop.i32";
>> >>       bld_base->op_actions[TGSI_OPCODE_POW].emit = build_tgsi_intrinsic_nomem;
>> >>       bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32";
>> >>       bld_base->op_actions[TGSI_OPCODE_ROUND].emit = build_tgsi_intrinsic_nomem;
>> >> @@ -1389,6 +1395,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = build_tgsi_intrinsic_nomem;
>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.AMDGPU.trunc";
>> >>       bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd;
>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].emit = build_tgsi_intrinsic_nomem;
>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = "llvm.AMDGPU.bfe.u32";
>> >>       bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv;
>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].emit = build_tgsi_intrinsic_nomem;
>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = "llvm.AMDGPU.umax";
>> >> --
>> >> 2.1.0
>> >>
>> >> _______________________________________________
>> >> mesa-dev mailing list
>> >> mesa-dev at lists.freedesktop.org
>> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev