[Mesa-dev] [PATCH 12/15] ac: add support for SPV_AMD_shader_ballot
Nicolai Hähnle
nhaehnle at gmail.com
Thu Nov 2 16:10:22 UTC 2017
On 31.10.2017 16:36, Connor Abbott wrote:
> On Tue, Oct 31, 2017 at 2:08 AM, Dave Airlie <airlied at gmail.com> wrote:
>>> +LLVMValueRef
>>> +ac_build_subgroup_inclusive_scan(struct ac_llvm_context *ctx,
>>> + LLVMValueRef src,
>>> + ac_reduce_op reduce,
>>> + LLVMValueRef identity)
>>> +{
>>> + /* See http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>> + *
>>> + * Note that each dpp/reduce pair is supposed to be compiled down to
>>> + * one instruction by LLVM, at least for 32-bit values.
>>> + *
>>> + * TODO: use @llvm.amdgcn.ds.swizzle on SI and CI
>>> + */
>>> + LLVMValueRef value = src;
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, src,
>>> + dpp_row_sr(1), 0xf, 0xf, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, src,
>>> + dpp_row_sr(2), 0xf, 0xf, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, src,
>>> + dpp_row_sr(3), 0xf, 0xf, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, value,
>>> + dpp_row_sr(4), 0xf, 0xe, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, value,
>>> + dpp_row_sr(8), 0xf, 0xc, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, value,
>>> + dpp_row_bcast15, 0xa, 0xf, false));
>>> + value = reduce(ctx, value,
>>> + ac_build_dpp(ctx, identity, value,
>>> + dpp_row_bcast31, 0xc, 0xf, false));
>>
>> btw I dumped some shaders from doom on pro,
>>
>> it looked like it ended up with
>>
>> 1, 0xf, 0xf,
>> 2, 0xf, 0xf,
>> 4, 0xf, 0xf
>> 8, 0xf, 0xf
>> bcast15 0xa, 0xf
>> bcast31 0xc, 0xf
>>
>> It also seems to apply these direct to instructions like
>> /*000000002b80*/ s_nop 0x0
>> /*000000002b84*/ v_min_u32 v83, v83, v83 row_shr:1 bank_mask:15
>> row_mask:15
>> /*000000002b8c*/ s_nop 0x1
>> /*000000002b90*/ v_min_u32 v83, v83, v83 row_shr:2 bank_mask:15
>> row_mask:15
>> /*000000002b98*/ s_nop 0x1
>> /*000000002b9c*/ v_min_u32 v83, v83, v83 row_shr:4 bank_mask:15
>> row_mask:15
>> /*000000002ba4*/ s_nop 0x1
>> /*000000002ba8*/ v_min_u32 v83, v83, v83 row_shr:8 bank_mask:15
>> row_mask:15
>> /*000000002bb0*/ s_nop 0x1
>> /*000000002bb4*/ v_min_u32 v83, v83, v83 row_bcast15
>> bank_mask:15 row_mask:10
>> /*000000002bbc*/ s_nop 0x1
>> /*000000002bc0*/ v_min_u32 v83, v83, v83 row_bcast31
>> bank_mask:15 row_mask:12
>>
>> I think the instruction combining is probably an llvm job, but I
>> wonder if the different row_shr
>> etc is what we should use as well.
>
> Yeah, LLVM should be combining the move and min -- hence the comment
> here -- but it isn't yet. That shouldn't be too hard to do once we get
> it working. Also, I've seen that way of doing it before, and IIRC it's
> one instruction slower than the sequence in the blog post I cited,
> since even though there's one less instruction, there's an extra
> two-cycle stall between the first two instructions since v83 is the
> destination of the first instruction and DPP source of the second
> (hence the s_nop 0x1). So once we combine instructions this should be
> better than what -pro does :)
Agreed, though even more ideally, LLVM would be able to fill those gaps
with other instructions ;)
Anyway, the combining of instructions is really the important task.
Cheers,
Nicolai
>
>>
>> Dave.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list