[Mesa-dev] [PATCH 7/9] radeonsi: add support for easy opcodes from ARB_gpu_shader5

Tom Stellard tom at stellard.net
Tue Mar 10 16:09:15 PDT 2015


On Tue, Mar 10, 2015 at 11:01:21PM +0100, Marek Olšák wrote:
> I've looked into how to recognize BFM and BFI and discovered that if
> TGSI_OPCODE_BFI is expanded, it's _impossible_ to recognize the
> pattern in the backend due to LLVM transformations. The reason it's
> impossible is that one particular simplification of the expanded IR
> can always be done and it always changes the IR in a way that BFI
> can't be recognized anymore.
> 
> The ideal transformation from TGSI to ISA is (note that this is also
> how GLSL defines the opcode):
> 
> TGSI_OPCODE_BFI(base, insert, offset, bits)
> = BFI(BFM(bits, offset), SHL(insert, offset), base) =
>     s_lshl_b32 s1, s4, s6
>     s_bfm_b32 s0, s0, s6
>     v_mov_b32_e32 v0, s5
>     v_mov_b32_e32 v1, s1
>     v_bfi_b32 v0, s0, v1, v0
> Ideally 3 instructions if all sources are vector registers.
> 
> However, if TGSI_OPCODE_BFI is expanded into basic bitwise operations
> (BTW the result of BFM has 2 uses in BFI), LLVM applies this
> transformation:
>     (X << S) & (Y << S) ----> (X & Y) << S
> Which breaks recognition of BFI and also the second use of BFM.
> Therefore this version calculates the same BFM expression twice. The
> first BFM is recognized by pattern matching, but the second BFM as
> well as BFI is unrecognizable due to the transformation. The result
> is:
>     s_lshl_b32 s1, 1, s0
>     s_bfm_b32 s0, s0, s6
>     s_add_i32 s1, s1, -1
>     s_not_b32 s0, s0
>     s_and_b32 s1, s1, s4
>     s_and_b32 s0, s0, s5
>     s_lshl_b32 s1, s1, s6
>     s_or_b32 s0, s1, s0
> 
> There are 2 ways out of this:
> 
> 1) Use BFM and BFI intrinsics in Mesa. Simple and unlikely to break in
> the future.
> 
> 2) Try to recognize the expression tree seen by the backend. Changes
> in LLVM core can break it. More complicated shaders with more
> opportunities for transformations can break it too:
> 
> def : Pat <
>   (i32 (or (i32 (shl (i32 (and (i32 (add (i32 (shl 1, i32:$a)),
>             -1)), i32:$b)), i32:$c)),
>            (i32 (and (i32 (xor (i32 (shl (i32 (add (i32 (shl 1, i32:$a)),
>             -1)), i32:$c)), -1)), i32:$d)))),
>   (V_BFI_B32 (S_BFM_B32 $a, $c),
>              (S_LSHL_B32 $b, $c),
>              $d)
> >;

I don't want to waste a lot of time discussing this, because it probably
doesn't matter too much in the long run.  I'm fine with using
intrinsics, but I just wanted to point out a few things in case you or
someone else wants to get this working using LLVM IR.

1. Running the instruction combining pass should help with pattern
matching.  This transforms common sequence into canonical forms which
make them easier to match in the backend.  We should be running this
pass anyway as it has some good optimization.


diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index dce5b55..45c9eb8 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1444,6 +1444,7 @@ void radeon_llvm_finalize_module(struct
radeon_llvm_context * ctx)
        LLVMAddPromoteMemoryToRegisterPass(gallivm->passmgr);
 
        /* Add some optimization passes */
+       LLVMAddInstructionCombiningPass(gallivm->passmgr);
        LLVMAddScalarReplAggregatesPass(gallivm->passmgr);
        LLVMAddLICMPass(gallivm->passmgr);
        LLVMAddAggressiveDCEPass(gallivm->passmgr);




2. We have a number of patterns for BFI already in AMDGPUInstructions.td
If we are modeling BFI using LLVM IR, we should be using one of those
patterns.

3. The BFI, BFM, LSHL sequence does not need to be matched in a single
pattern.  There should be one pattern for BFI, one for BFM, and one for
LSHL.

4. Using intrinsics means no constant folding will be done, which could
lead to worse code in some cases.


-Tom
> 
> The pattern is ugly, so I choose 1. We can still have separate
> patterns for BFI and BFM, but they would be mostly useless for GLSL.
> 
> Marek
> 
> 
> 
> On Tue, Mar 10, 2015 at 5:00 PM, Marek Olšák <maraeo at gmail.com> wrote:
> > It seems overkill to use LLVM IR for BFE. There are several ways to
> > express BFE(value, offset, bits) exactly, for example:
> >
> > # if bits == 0:
> > #     0
> > # else if offset + bits < 32:
> > #     (value << (32 - offset - bits)) >> (32 - bits)
> > # else:
> > #     value >> offset
> >
> > Which can be simplified to either:
> >
> > # both = offset + bits
> > # if bits == 0:
> > #     0
> > # else if both < 32:
> > #     (value << (32 - both)) >> (32 - bits)
> > # else:
> > #     value >> offset
> >
> > Or:
> >
> > # if bits == 0:
> > #     0
> > # else if offset + bits < 32:
> > #     inv_bits = 32 - bits
> > #     (value << (inv_bits - offset)) >> inv_bits
> > # else:
> > #     value >> offset
> >
> > All 3 variants should use ASHR if the signed version is required.
> > Alternatively, one can use this as well:
> >
> > # if bits == 32:
> > #     value
> > # else:
> > #     (value >> offset) & ((1 << bits) - 1)
> >
> > And an explicit sign extension should be used for the signed version.
> >
> > The first three are so complex that I doubt it would be easy to
> > recognize them and match them precisely. They are also very unlikely
> > to appear open-coded in real applications, therefore, expanding them
> > and then matching them again seems like a huge waste of time. Apps
> > will likely just use LSHR+AND, which extracts a bitfield, but it's not
> > the same as BFE, so BFE can't be used for that. (if the second operand
> > of AND is a constant expression, then BFE can indeed be used)
> >
> > I agree that we could use LLVM IR for simple instructions like BFM and
> > BFI, but I doubt it's a good idea for complex instructions like BFE,
> > and any small optimization that changes the expression can break the
> > matching.
> >
> > Marek
> >
> >
> > On Tue, Mar 10, 2015 at 3:30 PM, Tom Stellard <tom at stellard.net> wrote:
> >> On Tue, Mar 10, 2015 at 12:42:38PM +0100, Marek Olšák wrote:
> >>> OK. What about patches 8 an 9?
> >>>
> >>
> >> I think the intrinsics in 9 are OK, but 8 should be using LLVM IR.
> >>
> >> -Tom
> >>
> >>> Marek
> >>>
> >>> On Thu, Mar 5, 2015 at 8:30 PM, Tom Stellard <tom at stellard.net> wrote:
> >>> > On Mon, Mar 02, 2015 at 12:54:21PM +0100, Marek Olšák wrote:
> >>> >> From: Marek Olšák <marek.olsak at amd.com>
> >>> >>
> >>> >
> >>> > Hi Marek,
> >>> >
> >>> > After discussing with Matt, I think we should use LLVM IR rather than
> >>> > intrinsics for IBFE and UBFE and then add patterns for them either in
> >>> > the TableGen Files or AMDGPUISelDAGToDAG.cpp.
> >>> >
> >>> > Using intrinsics for BREV and POPC is fine though.
> >>> >
> >>> > -Tom
> >>> >
> >>> >> ---
> >>> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 8 ++++++++
> >>> >>  1 file changed, 8 insertions(+)
> >>> >>
> >>> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>> >> index 385d3ad..034095f 100644
> >>> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>> >> @@ -1293,6 +1293,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
> >>> >>       bld_base->op_actions[TGSI_OPCODE_AND].emit = emit_and;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_ARL].emit = emit_arl;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_BGNLOOP].emit = bgnloop_emit;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].emit = build_tgsi_intrinsic_nomem;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = "llvm.AMDGPU.brev";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].emit = build_tgsi_intrinsic_nomem;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "ceil";
> >>> >> @@ -1326,6 +1328,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
> >>> >>       bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].emit = build_tgsi_intrinsic_nomem;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = "llvm.AMDIL.abs.";
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].emit = build_tgsi_intrinsic_nomem;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = "llvm.AMDGPU.bfe.i32";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit;
> >>> >> @@ -1350,6 +1354,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
> >>> >>       bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_OR].emit = emit_or;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].emit = build_tgsi_intrinsic_nomem;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].intr_name = "llvm.ctpop.i32";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_POW].emit = build_tgsi_intrinsic_nomem;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_ROUND].emit = build_tgsi_intrinsic_nomem;
> >>> >> @@ -1389,6 +1395,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
> >>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = build_tgsi_intrinsic_nomem;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.AMDGPU.trunc";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].emit = build_tgsi_intrinsic_nomem;
> >>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = "llvm.AMDGPU.bfe.u32";
> >>> >>       bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].emit = build_tgsi_intrinsic_nomem;
> >>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = "llvm.AMDGPU.umax";
> >>> >> --
> >>> >> 2.1.0
> >>> >>
> >>> >> _______________________________________________
> >>> >> mesa-dev mailing list
> >>> >> mesa-dev at lists.freedesktop.org
> >>> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list