[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one
Jose Fonseca
jfonseca at vmware.com
Tue Dec 20 14:20:20 UTC 2016
On 12/12/16 00:12, sroland at vmware.com wrote:
> From: Roland Scheidegger <sroland at vmware.com>
>
> Now that there's some SoA fetch which never falls back, we should usually get
> results which are better or at least not worse (something like rgba32f will
> stay the same). I suppose though it might be worse in some cases where the
> format doesn't require conversion (e.g. rg32f) and goes straight to output -
> if llvm was able to see through all shuffles then it might have been able
> to do away with the aos->soa->aos transpose entirely which can no longer work
> possibly except for 4-channel formats (due to replacing the undef channels
> with 0/1 before the second transpose and not the first - llvm will
> definitely not be able to figure that out). That might actually be quite
> common, but I'm not sure llvm really could optimize it in the first place,
> and if it's a problem we should just special case such inputs (though note
> that if conversion is needed, it isn't obvious if it's better to skip
> the transpose or do the conversion AoS-style).
>
> For cases which get way better, think something like R16_UNORM with 8-wide
> vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by
> a couple of shuffles to stitch things together (if it is smart enough,
> 6 unpacks) and then a (8-wide) transpose (not sure if llvm could even
> optimize the shuffles + transpose, since the 16bit values were actually
> sign-extended to 128bit before being cast to a float vec, so that would be
> another 8 unpacks). Now that is just 8 fetches (directly inserted into
> vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul.
> ---
> src/gallium/auxiliary/draw/draw_llvm.c | 54 +++++++++++++++++++++++++---------
> 1 file changed, 40 insertions(+), 14 deletions(-)
>
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c
> index 19b75a5..f895b76 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
> @@ -755,11 +755,9 @@ fetch_vector(struct gallivm_state *gallivm,
> LLVMValueRef *inputs,
> LLVMValueRef indices)
> {
> - LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
> LLVMBuilderRef builder = gallivm->builder;
> struct lp_build_context blduivec;
> LLVMValueRef offset, valid_mask;
> - LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
> unsigned i;
>
> lp_build_context_init(&blduivec, gallivm, lp_uint_type(vs_type));
> @@ -783,21 +781,49 @@ fetch_vector(struct gallivm_state *gallivm,
> }
>
> /*
> - * Note: we probably really want to use SoA fetch, not AoS one (albeit
> - * for most formats it will amount to the same as this isn't very
> - * optimized). But looks dangerous since it assumes alignment.
> + * Use SoA fetch. This should produce better code usually.
> + * Albeit it's possible there's exceptions (in particular if the fetched
> + * value is going directly to output if it's something like RG32F).
> */
> - for (i = 0; i < vs_type.length; i++) {
> - LLVMValueRef offset1, elem;
> - elem = lp_build_const_int32(gallivm, i);
> - offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
> + if (1) {
> + struct lp_type res_type = vs_type;
> + /* The type handling is annoying here... */
> + if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_RGB &&
> + format_desc->channel[0].pure_integer) {
> + if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_SIGNED) {
> + res_type = lp_type_int_vec(vs_type.width, vs_type.width * vs_type.length);
> + }
> + else if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) {
> + res_type = lp_type_uint_vec(vs_type.width, vs_type.width * vs_type.length);
> + }
> + }
>
> - aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
> - lp_float32_vec4_type(),
> - FALSE, map_ptr, offset1,
> - zero, zero, NULL);
> + lp_build_fetch_rgba_soa(gallivm, format_desc,
> + res_type, FALSE, map_ptr, offset,
> + blduivec.zero, blduivec.zero,
> + NULL, inputs);
> +
> + for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
> + inputs[i] = LLVMBuildBitCast(builder, inputs[i],
> + lp_build_vec_type(gallivm, vs_type), "");
> + }
> +
> + }
> + else {
Let's kill the old code path. The multitude of live code paths is more
than enough. No point in keeping additional dead code paths around.
> + LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
> + LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
> + for (i = 0; i < vs_type.length; i++) {
> + LLVMValueRef offset1, elem;
> + elem = lp_build_const_int32(gallivm, i);
> + offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
> +
> + aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
> + lp_float32_vec4_type(),
> + FALSE, map_ptr, offset1,
> + zero, zero, NULL);
> + }
> + convert_to_soa(gallivm, aos_fetch, inputs, vs_type);
> }
> - convert_to_soa(gallivm, aos_fetch, inputs, vs_type);
>
> for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
> inputs[i] = LLVMBuildBitCast(builder, inputs[i], blduivec.vec_type, "");
>
Reviewed-by: Jose Fonseca <jfonseca at vmware.com>
More information about the mesa-dev
mailing list