[Mesa-dev] [PATCH] radeonsi: add a workaround for weird s_buffer_load_dword behavior on SI

Nicolai Hähnle nhaehnle at gmail.com
Tue Oct 24 09:12:56 UTC 2017


On 22.10.2017 23:19, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak at amd.com>
> 
> See my LLVM patch which fixes the root cause.
> 
> Users have to apply this patch and then they have 2 choices:
> - Downgrade to LLVM 5.0
> - Update to LLVM git after my LLVM patch is pushed.
> 
> It won't be possible to use current and earlier development version
> of LLVM 6.0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>



> ---
>   src/gallium/drivers/radeonsi/si_shader.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
> index 1320c6f..a248cea 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -2008,28 +2008,35 @@ static LLVMValueRef fetch_constant(
>   	if (sel->info.const_buffers_declared == 1 &&
>   	    sel->info.shader_buffers_declared == 0) {
>   		LLVMValueRef ptr =
>   			LLVMGetParam(ctx->main_fn, ctx->param_const_and_shader_buffers);
>   
>   		/* This enables use of s_load_dword and flat_load_dword for const buffer 0
>   		 * loads, and up to x4 load opcode merging. However, it leads to horrible
>   		 * code reducing SIMD wave occupancy from 8 to 2 in many cases.
>   		 *
>   		 * Using s_buffer_load_dword (x1) seems to be the best option right now.
> +		 *
> +		 * LLVM 5.0 on SI doesn't insert a required s_nop between SALU setting
> +		 * a descriptor and s_buffer_load_dword using it, so we can't expand
> +		 * the pointer into a full descriptor like below. We have to use
> +		 * s_load_dword instead. The only case when LLVM 5.0 would select
> +		 * s_buffer_load_dword (that we have to prevent) is when we use use
> +		 * a literal offset where we don't need bounds checking.
>   		 */
> -#if 0 /* keep this codepath disabled */
> -		if (!reg->Register.Indirect) {
> +		if (ctx->screen->b.chip_class == SI &&
> +                    HAVE_LLVM < 0x0600 &&
> +                    !reg->Register.Indirect) {
>   			addr = LLVMBuildLShr(ctx->ac.builder, addr, LLVMConstInt(ctx->i32, 2, 0), "");
>   			LLVMValueRef result = ac_build_load_invariant(&ctx->ac, ptr, addr);
>   			return bitcast(bld_base, type, result);
>   		}
> -#endif
>   
>   		/* Do the bounds checking with a descriptor, because
>   		 * doing computation and manual bounds checking of 64-bit
>   		 * addresses generates horrible VALU code with very high
>   		 * VGPR usage and very low SIMD occupancy.
>   		 */
>   		ptr = LLVMBuildPtrToInt(ctx->ac.builder, ptr, ctx->i64, "");
>   		ptr = LLVMBuildBitCast(ctx->ac.builder, ptr, ctx->v2i32, "");
>   
>   		LLVMValueRef desc_elems[] = {
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list