[Mesa-dev] [PATCH 12/15] ac: add support for SPV_AMD_shader_ballot

Connor Abbott cwabbott0 at gmail.com
Tue Oct 31 15:36:18 UTC 2017


On Tue, Oct 31, 2017 at 2:08 AM, Dave Airlie <airlied at gmail.com> wrote:
>> +LLVMValueRef
>> +ac_build_subgroup_inclusive_scan(struct ac_llvm_context *ctx,
>> +                                LLVMValueRef src,
>> +                                ac_reduce_op reduce,
>> +                                LLVMValueRef identity)
>> +{
>> +       /* See http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>> +        *
>> +        * Note that each dpp/reduce pair is supposed to be compiled down to
>> +        * one instruction by LLVM, at least for 32-bit values.
>> +        *
>> +        * TODO: use @llvm.amdgcn.ds.swizzle on SI and CI
>> +        */
>> +       LLVMValueRef value = src;
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, src,
>> +                                   dpp_row_sr(1), 0xf, 0xf, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, src,
>> +                                   dpp_row_sr(2), 0xf, 0xf, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, src,
>> +                                   dpp_row_sr(3), 0xf, 0xf, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, value,
>> +                                   dpp_row_sr(4), 0xf, 0xe, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, value,
>> +                                   dpp_row_sr(8), 0xf, 0xc, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, value,
>> +                                   dpp_row_bcast15, 0xa, 0xf, false));
>> +       value = reduce(ctx, value,
>> +                      ac_build_dpp(ctx, identity, value,
>> +                                   dpp_row_bcast31, 0xc, 0xf, false));
>
> btw I dumped some shaders from doom on pro,
>
> it looked like it ended up with
>
> 1, 0xf, 0xf,
> 2, 0xf, 0xf,
> 4, 0xf, 0xf
> 8, 0xf, 0xf
> bcast15 0xa, 0xf
> bcast31 0xc, 0xf
>
> It also seems to apply these direct to instructions like
> /*000000002b80*/ s_nop           0x0
> /*000000002b84*/ v_min_u32       v83, v83, v83 row_shr:1 bank_mask:15
> row_mask:15
> /*000000002b8c*/ s_nop           0x1
> /*000000002b90*/ v_min_u32       v83, v83, v83 row_shr:2 bank_mask:15
> row_mask:15
> /*000000002b98*/ s_nop           0x1
> /*000000002b9c*/ v_min_u32       v83, v83, v83 row_shr:4 bank_mask:15
> row_mask:15
> /*000000002ba4*/ s_nop           0x1
> /*000000002ba8*/ v_min_u32       v83, v83, v83 row_shr:8 bank_mask:15
> row_mask:15
> /*000000002bb0*/ s_nop           0x1
> /*000000002bb4*/ v_min_u32       v83, v83, v83 row_bcast15
> bank_mask:15 row_mask:10
> /*000000002bbc*/ s_nop           0x1
> /*000000002bc0*/ v_min_u32       v83, v83, v83 row_bcast31
> bank_mask:15 row_mask:12
>
> I think the instruction combining is probably an llvm job, but I
> wonder if the different row_shr
> etc is what we should use as well.

Yeah, LLVM should be combining the move and min -- hence the comment
here -- but it isn't yet. That shouldn't be too hard to do once we get
it working. Also, I've seen that way of doing it before, and IIRC it's
one instruction slower than the sequence in the blog post I cited,
since even though there's one less instruction, there's an extra
two-cycle stall between the first two instructions since v83 is the
destination of the first instruction and DPP source of the second
(hence the s_nop 0x1). So once we combine instructions this should be
better than what -pro does :)

>
> Dave.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list