[Mesa-dev] [PATCH v2] r600g: Workaround for a harware bug with nested loops on Cayman

Vadim Girlin vadimgirlin at gmail.com
Mon Apr 15 23:46:31 PDT 2013


On 04/15/2013 11:22 PM, Martin Andersson wrote:
> There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
> LOOP_STARTxxx for nested loops may put the branch stack into a state
> such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
> by replacing the ALU_PUSH_BEFORE with a PUSH + ALU
>
> Fixes piglit tests EXT_transform_feedback/order*
>
> v2: Use existing loop count and improve comment
> ---
>   src/gallium/drivers/r600/r600_shader.c | 17 ++++++++++++++---
>   1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
> index 6dbca50..f4398fd 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
>   	return 0;
>   }
>
> -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
> +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int alu_type)
>   {
>   	struct r600_bytecode_alu alu;
>   	int r;
> @@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
>
>   	alu.last = 1;
>
> -	r = r600_bytecode_add_alu_type(ctx->bc, &alu, CF_OP_ALU_PUSH_BEFORE);
> +	r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
>   	if (r)
>   		return r;
>   	return 0;
> @@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct r600_shader_ctx *ctx, unsigned fc_sp)
>
>   static int tgsi_if(struct r600_shader_ctx *ctx)
>   {
> -	emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
> +	int alu_type = CF_OP_ALU_PUSH_BEFORE;
> +
> +	/* There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
> +	 * LOOP_STARTxxx for nested loops may put the branch stack into a state
> +	 * such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
> +	 * by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */
> +	if (ctx->bc->chip_class == CAYMAN && ctx->bc->stack.loop > 1) {
> +		r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);

Oh, it seems I overlooked potential issue here: jump address for PUSH is 
not set properly, so I guess there will be GPU lockups in case of a 
jump. Ideally we could set it to jump over the whole IF-ENDIF block if 
there are no active threads, but I think it's a rare case, so simplest 
fix is to avoid computation of the address and set jump address for PUSH 
to the next instruction, like this:

		ctx->bc->cf_last->cf_addr = ctx->bc->cf_last->id + 2;

We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all 
so I think at least we won't have any serious performance regressions.

Everything else looks ok, so I think I'll commit your patch with this 
change soon if there are no objections.

Vadim

> +		alu_type = CF_OP_ALU;
> +	}
> +
> +	emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
>
>   	r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
>
>



More information about the mesa-dev mailing list