[Mesa-dev] [PATCH 7/9] radeonsi: add support for easy opcodes from ARB_gpu_shader5

Tue Mar 10 15:01:21 PDT 2015

I've looked into how to recognize BFM and BFI and discovered that if
TGSI_OPCODE_BFI is expanded, it's _impossible_ to recognize the
pattern in the backend due to LLVM transformations. The reason it's
impossible is that one particular simplification of the expanded IR
can always be done and it always changes the IR in a way that BFI
can't be recognized anymore.

The ideal transformation from TGSI to ISA is (note that this is also
how GLSL defines the opcode):

TGSI_OPCODE_BFI(base, insert, offset, bits)
= BFI(BFM(bits, offset), SHL(insert, offset), base) =
    s_lshl_b32 s1, s4, s6
    s_bfm_b32 s0, s0, s6
    v_mov_b32_e32 v0, s5
    v_mov_b32_e32 v1, s1
    v_bfi_b32 v0, s0, v1, v0
Ideally 3 instructions if all sources are vector registers.

However, if TGSI_OPCODE_BFI is expanded into basic bitwise operations
(BTW the result of BFM has 2 uses in BFI), LLVM applies this
transformation:
    (X << S) & (Y << S) ----> (X & Y) << S
Which breaks recognition of BFI and also the second use of BFM.
Therefore this version calculates the same BFM expression twice. The
first BFM is recognized by pattern matching, but the second BFM as
well as BFI is unrecognizable due to the transformation. The result
is:
    s_lshl_b32 s1, 1, s0
    s_bfm_b32 s0, s0, s6
    s_add_i32 s1, s1, -1
    s_not_b32 s0, s0
    s_and_b32 s1, s1, s4
    s_and_b32 s0, s0, s5
    s_lshl_b32 s1, s1, s6
    s_or_b32 s0, s1, s0

There are 2 ways out of this:

1) Use BFM and BFI intrinsics in Mesa. Simple and unlikely to break in
the future.

2) Try to recognize the expression tree seen by the backend. Changes
in LLVM core can break it. More complicated shaders with more
opportunities for transformations can break it too:

def : Pat <
  (i32 (or (i32 (shl (i32 (and (i32 (add (i32 (shl 1, i32:$a)),
            -1)), i32:$b)), i32:$c)),
           (i32 (and (i32 (xor (i32 (shl (i32 (add (i32 (shl 1, i32:$a)),
            -1)), i32:$c)), -1)), i32:$d)))),
  (V_BFI_B32 (S_BFM_B32 $a, $c),
             (S_LSHL_B32 $b, $c),
             $d)
>;

The pattern is ugly, so I choose 1. We can still have separate
patterns for BFI and BFM, but they would be mostly useless for GLSL.

Marek

On Tue, Mar 10, 2015 at 5:00 PM, Marek Olšák <maraeo at gmail.com> wrote:
> It seems overkill to use LLVM IR for BFE. There are several ways to
> express BFE(value, offset, bits) exactly, for example:
>
> # if bits == 0:
> #     0
> # else if offset + bits < 32:
> #     (value << (32 - offset - bits)) >> (32 - bits)
> # else:
> #     value >> offset
>
> Which can be simplified to either:
>
> # both = offset + bits
> # if bits == 0:
> #     0
> # else if both < 32:
> #     (value << (32 - both)) >> (32 - bits)
> # else:
> #     value >> offset
>
> Or:
>
> # if bits == 0:
> #     0
> # else if offset + bits < 32:
> #     inv_bits = 32 - bits
> #     (value << (inv_bits - offset)) >> inv_bits
> # else:
> #     value >> offset
>
> All 3 variants should use ASHR if the signed version is required.
> Alternatively, one can use this as well:
>
> # if bits == 32:
> #     value
> # else:
> #     (value >> offset) & ((1 << bits) - 1)
>
> And an explicit sign extension should be used for the signed version.
>
> The first three are so complex that I doubt it would be easy to
> recognize them and match them precisely. They are also very unlikely
> to appear open-coded in real applications, therefore, expanding them
> and then matching them again seems like a huge waste of time. Apps
> will likely just use LSHR+AND, which extracts a bitfield, but it's not
> the same as BFE, so BFE can't be used for that. (if the second operand
> of AND is a constant expression, then BFE can indeed be used)
>
> I agree that we could use LLVM IR for simple instructions like BFM and
> BFI, but I doubt it's a good idea for complex instructions like BFE,
> and any small optimization that changes the expression can break the
> matching.
>
> Marek
>
>
> On Tue, Mar 10, 2015 at 3:30 PM, Tom Stellard <tom at stellard.net> wrote:
>> On Tue, Mar 10, 2015 at 12:42:38PM +0100, Marek Olšák wrote:
>>> OK. What about patches 8 an 9?
>>>
>>
>> I think the intrinsics in 9 are OK, but 8 should be using LLVM IR.
>>
>> -Tom
>>
>>> Marek
>>>
>>> On Thu, Mar 5, 2015 at 8:30 PM, Tom Stellard <tom at stellard.net> wrote:
>>> > On Mon, Mar 02, 2015 at 12:54:21PM +0100, Marek Olšák wrote:
>>> >> From: Marek Olšák <marek.olsak at amd.com>
>>> >>
>>> >
>>> > Hi Marek,
>>> >
>>> > After discussing with Matt, I think we should use LLVM IR rather than
>>> > intrinsics for IBFE and UBFE and then add patterns for them either in
>>> > the TableGen Files or AMDGPUISelDAGToDAG.cpp.
>>> >
>>> > Using intrinsics for BREV and POPC is fine though.
>>> >
>>> > -Tom
>>> >
>>> >> ---
>>> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 8 ++++++++
>>> >>  1 file changed, 8 insertions(+)
>>> >>
>>> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>>> >> index 385d3ad..034095f 100644
>>> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>>> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
>>> >> @@ -1293,6 +1293,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>>> >>       bld_base->op_actions[TGSI_OPCODE_AND].emit = emit_and;
>>> >>       bld_base->op_actions[TGSI_OPCODE_ARL].emit = emit_arl;
>>> >>       bld_base->op_actions[TGSI_OPCODE_BGNLOOP].emit = bgnloop_emit;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].emit = build_tgsi_intrinsic_nomem;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = "llvm.AMDGPU.brev";
>>> >>       bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit;
>>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].emit = build_tgsi_intrinsic_nomem;
>>> >>       bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "ceil";
>>> >> @@ -1326,6 +1328,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>>> >>       bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp;
>>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].emit = build_tgsi_intrinsic_nomem;
>>> >>       bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = "llvm.AMDIL.abs.";
>>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].emit = build_tgsi_intrinsic_nomem;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = "llvm.AMDGPU.bfe.i32";
>>> >>       bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv;
>>> >>       bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit;
>>> >>       bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit;
>>> >> @@ -1350,6 +1354,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>>> >>       bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod;
>>> >>       bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not;
>>> >>       bld_base->op_actions[TGSI_OPCODE_OR].emit = emit_or;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].emit = build_tgsi_intrinsic_nomem;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_POPC].intr_name = "llvm.ctpop.i32";
>>> >>       bld_base->op_actions[TGSI_OPCODE_POW].emit = build_tgsi_intrinsic_nomem;
>>> >>       bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32";
>>> >>       bld_base->op_actions[TGSI_OPCODE_ROUND].emit = build_tgsi_intrinsic_nomem;
>>> >> @@ -1389,6 +1395,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
>>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = build_tgsi_intrinsic_nomem;
>>> >>       bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.AMDGPU.trunc";
>>> >>       bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].emit = build_tgsi_intrinsic_nomem;
>>> >> +     bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = "llvm.AMDGPU.bfe.u32";
>>> >>       bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv;
>>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].emit = build_tgsi_intrinsic_nomem;
>>> >>       bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = "llvm.AMDGPU.umax";
>>> >> --
>>> >> 2.1.0
>>> >>
>>> >> _______________________________________________
>>> >> mesa-dev mailing list
>>> >> mesa-dev at lists.freedesktop.org
>>> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev