[Mesa-dev] [PATCH v2] r600g: Workaround for a harware bug with nested loops on Cayman

Alex Deucher alexdeucher at gmail.com
Tue Apr 16 05:50:18 PDT 2013


On Tue, Apr 16, 2013 at 2:46 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> On 04/15/2013 11:22 PM, Martin Andersson wrote:
>>
>> There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
>> LOOP_STARTxxx for nested loops may put the branch stack into a state
>> such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
>> by replacing the ALU_PUSH_BEFORE with a PUSH + ALU
>>
>> Fixes piglit tests EXT_transform_feedback/order*
>>
>> v2: Use existing loop count and improve comment
>> ---
>>   src/gallium/drivers/r600/r600_shader.c | 17 ++++++++++++++---
>>   1 file changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>> b/src/gallium/drivers/r600/r600_shader.c
>> index 6dbca50..f4398fd 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
>>         return 0;
>>   }
>>
>> -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
>> +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int
>> alu_type)
>>   {
>>         struct r600_bytecode_alu alu;
>>         int r;
>> @@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx
>> *ctx, int opcode)
>>
>>         alu.last = 1;
>>
>> -       r = r600_bytecode_add_alu_type(ctx->bc, &alu,
>> CF_OP_ALU_PUSH_BEFORE);
>> +       r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
>>         if (r)
>>                 return r;
>>         return 0;
>> @@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct
>> r600_shader_ctx *ctx, unsigned fc_sp)
>>
>>   static int tgsi_if(struct r600_shader_ctx *ctx)
>>   {
>> -       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
>> +       int alu_type = CF_OP_ALU_PUSH_BEFORE;
>> +
>> +       /* There is a hardware bug on Cayman where a BREAK/CONTINUE
>> followed by
>> +        * LOOP_STARTxxx for nested loops may put the branch stack into a
>> state
>> +        * such that ALU_PUSH_BEFORE doesn't work as expected. Workaround
>> this
>> +        * by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */
>> +       if (ctx->bc->chip_class == CAYMAN && ctx->bc->stack.loop > 1) {
>> +               r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);
>
>
> Oh, it seems I overlooked potential issue here: jump address for PUSH is not
> set properly, so I guess there will be GPU lockups in case of a jump.
> Ideally we could set it to jump over the whole IF-ENDIF block if there are
> no active threads, but I think it's a rare case, so simplest fix is to avoid
> computation of the address and set jump address for PUSH to the next
> instruction, like this:
>
>                 ctx->bc->cf_last->cf_addr = ctx->bc->cf_last->id + 2;
>
> We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all so I
> think at least we won't have any serious performance regressions.
>
> Everything else looks ok, so I think I'll commit your patch with this change
> soon if there are no objections.

Sounds good.  Please add a note that this patch is a candidate for the
9.1 branch when you apply it.

Alex

>
> Vadim
>
>
>> +               alu_type = CF_OP_ALU;
>> +       }
>> +
>> +       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
>>
>>         r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
>>
>>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list