[Mesa-dev] [PATCH 12/15] ac: add support for SPV_AMD_shader_ballot
Dave Airlie
airlied at gmail.com
Tue Oct 31 06:08:28 UTC 2017
> +LLVMValueRef
> +ac_build_subgroup_inclusive_scan(struct ac_llvm_context *ctx,
> + LLVMValueRef src,
> + ac_reduce_op reduce,
> + LLVMValueRef identity)
> +{
> + /* See http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
> + *
> + * Note that each dpp/reduce pair is supposed to be compiled down to
> + * one instruction by LLVM, at least for 32-bit values.
> + *
> + * TODO: use @llvm.amdgcn.ds.swizzle on SI and CI
> + */
> + LLVMValueRef value = src;
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, src,
> + dpp_row_sr(1), 0xf, 0xf, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, src,
> + dpp_row_sr(2), 0xf, 0xf, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, src,
> + dpp_row_sr(3), 0xf, 0xf, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, value,
> + dpp_row_sr(4), 0xf, 0xe, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, value,
> + dpp_row_sr(8), 0xf, 0xc, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, value,
> + dpp_row_bcast15, 0xa, 0xf, false));
> + value = reduce(ctx, value,
> + ac_build_dpp(ctx, identity, value,
> + dpp_row_bcast31, 0xc, 0xf, false));
btw I dumped some shaders from doom on pro,
it looked like it ended up with
1, 0xf, 0xf,
2, 0xf, 0xf,
4, 0xf, 0xf
8, 0xf, 0xf
bcast15 0xa, 0xf
bcast31 0xc, 0xf
It also seems to apply these direct to instructions like
/*000000002b80*/ s_nop 0x0
/*000000002b84*/ v_min_u32 v83, v83, v83 row_shr:1 bank_mask:15
row_mask:15
/*000000002b8c*/ s_nop 0x1
/*000000002b90*/ v_min_u32 v83, v83, v83 row_shr:2 bank_mask:15
row_mask:15
/*000000002b98*/ s_nop 0x1
/*000000002b9c*/ v_min_u32 v83, v83, v83 row_shr:4 bank_mask:15
row_mask:15
/*000000002ba4*/ s_nop 0x1
/*000000002ba8*/ v_min_u32 v83, v83, v83 row_shr:8 bank_mask:15
row_mask:15
/*000000002bb0*/ s_nop 0x1
/*000000002bb4*/ v_min_u32 v83, v83, v83 row_bcast15
bank_mask:15 row_mask:10
/*000000002bbc*/ s_nop 0x1
/*000000002bc0*/ v_min_u32 v83, v83, v83 row_bcast31
bank_mask:15 row_mask:12
I think the instruction combining is probably an llvm job, but I
wonder if the different row_shr
etc is what we should use as well.
Dave.
More information about the mesa-dev
mailing list