[Mesa-dev] [PATCH 2/7] radeonsi: compute shaders w/out LDS/barriers don't have lower register limit

Marek Olšák maraeo at gmail.com
Thu May 25 17:04:42 UTC 2017


From: Marek Olšák <marek.olsak at amd.com>

Or do they? This doesn't hang, so it seems right, but I'm not 100% sure.
Setting VGPRS=256 (i.e. above the limit) with big threadgroups works fine.

shader-db: Spilled VGPRs: 107 -> 50 (-53.27 %)

DiRT Showdown and GRID Autosport have 100% reduction in VGPR spilling.
There are no other changes for shader-db.
---
 src/gallium/drivers/radeonsi/si_shader.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
index 61f1384..0ffe402 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4040,20 +4040,29 @@ static unsigned si_get_max_workgroup_size(const struct si_shader *shader)
 	               properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
 	               properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
 	               properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
 
 	if (!max_work_group_size) {
 		/* This is a variable group size compute shader,
 		 * compile it for the maximum possible group size.
 		 */
 		max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
 	}
+
+	/* Compute shader threadgroups without LDS usage and barriers don't
+	 * have to be stuck on the same compute unit, and so register usage
+	 * doesn't have to be limited.
+	 */
+	if (!shader->selector->local_size &&
+	    !shader->selector->info.uses_barrier)
+		return MIN2(64, max_work_group_size);
+
 	return max_work_group_size;
 }
 
 static void declare_per_stage_desc_pointers(struct si_shader_context *ctx,
 					    LLVMTypeRef *params,
 					    unsigned *num_params,
 					    bool assign_params)
 {
 	params[(*num_params)++] = si_const_array(ctx->v4i32,
 						 SI_NUM_SHADER_BUFFERS + SI_NUM_CONST_BUFFERS);
-- 
2.7.4



More information about the mesa-dev mailing list