[Mesa-dev] [PATCH] r600g: Workaround for a nested loop bug on Cayman

Vadim Girlin vadimgirlin at gmail.com
Mon Apr 15 02:02:17 PDT 2013


On 04/15/2013 10:52 AM, Martin Andersson wrote:
> On Mon, Apr 15, 2013 at 1:09 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> On 04/15/2013 01:05 AM, Martin Andersson wrote:
>>>
>>> There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx for nested
>>> loops may put the branch stack into a state such that ALU_PUSH_BEFORE
>>> doesn't work as expected. Workaround this by replacing the ALU_PUSH_BEFORE
>>> with a PUSH + ALU for nested loops.
>>>
>>> Fixes piglit tests:
>>> spec/!OpenGL 1.1/read-front
>>> spec/EXT_transform_feedback/order*
>>> spec/glsl-1.40/uniform_buffer/fs-struct-pad
>>>
>>> No piglit regressions.
>>> ---
>>>    src/gallium/drivers/r600/r600_shader.c | 33
>>> ++++++++++++++++++++++++++++++---
>>>    1 file changed, 30 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>>> b/src/gallium/drivers/r600/r600_shader.c
>>> index 6dbca50..aee011e 100644
>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>> @@ -252,6 +252,7 @@ static int tgsi_endif(struct r600_shader_ctx *ctx);
>>>    static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
>>>    static int tgsi_endloop(struct r600_shader_ctx *ctx);
>>>    static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
>>> +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx);
>>>
>>>    /*
>>>     * bytestream -> r600 shader
>>> @@ -5490,7 +5491,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
>>>          return 0;
>>>    }
>>>
>>> -static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
>>> +static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int
>>> alu_type)
>>>    {
>>>          struct r600_bytecode_alu alu;
>>>          int r;
>>> @@ -5510,7 +5511,7 @@ static int emit_logic_pred(struct r600_shader_ctx
>>> *ctx, int opcode)
>>>
>>>          alu.last = 1;
>>>
>>> -       r = r600_bytecode_add_alu_type(ctx->bc, &alu,
>>> CF_OP_ALU_PUSH_BEFORE);
>>> +       r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
>>>          if (r)
>>>                  return r;
>>>          return 0;
>>> @@ -5730,7 +5731,20 @@ static void break_loop_on_flag(struct
>>> r600_shader_ctx *ctx, unsigned fc_sp)
>>>
>>>    static int tgsi_if(struct r600_shader_ctx *ctx)
>>>    {
>>> -       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
>>> +       int alu_type = CF_OP_ALU_PUSH_BEFORE;
>>> +
>>> +       /*
>>> +          There is a bug where a BREAK/CONTINUE followed by LOOP_STARTxxx
>>> for nested
>>> +          loops may put the branch stack into a state such that
>>> ALU_PUSH_BEFORE
>>> +          doesn't work as expected. Workaround this by replacing the
>>> ALU_PUSH_BEFORE
>>> +          with a PUSH + ALU for nested loops.
>>> +        */
>>> +       if (ctx->bc->chip_class == CAYMAN &&
>>> need_cayman_loop_bug_workaround(ctx)) {
>>
>>
>> We already have current loop level for the stack size computation, see
>> r600_bytecode::stack, so I think need_cayman_loop_bug_workaround call may be
>> replaced with "ctx->bc->stack.loop > 1", if I'm not missing something.
>
> Ok, will try that tonight. Should I add a comment that it is a hardware bug?

Yes, you might want to clarify that it's a hw bug on cayman, though I 
think it's OK either way. Also git complains about some trailing spaces 
in your patch.

With that change for condition (and removal of the need_cayman_... 
function that becomes unused) and fixed whitespace issues, the patch 
looks good to me.

Vadim

>
>> Vadim
>>
>>
>>> +               r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);
>>> +               alu_type = CF_OP_ALU;
>>> +       }
>>> +
>>> +       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);
>>>
>>>          r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);
>>>
>>> @@ -5834,6 +5848,19 @@ static int tgsi_loop_brk_cont(struct
>>> r600_shader_ctx *ctx)
>>>          return 0;
>>>    }
>>>
>>> +static bool need_cayman_loop_bug_workaround(struct r600_shader_ctx *ctx)
>>> +{
>>> +       unsigned int fscp;
>>> +       int num_loops = 0;
>>> +       for (fscp = ctx->bc->fc_sp; fscp > 0; fscp--)
>>> +       {
>>> +               if (FC_LOOP == ctx->bc->fc_stack[fscp].type)
>>> +                       ++num_loops;
>>> +       }
>>> +
>>> +       return num_loops >= 2;
>>> +}
>>> +
>>>    static int tgsi_umad(struct r600_shader_ctx *ctx)
>>>    {
>>>          struct tgsi_full_instruction *inst =
>>> &ctx->parse.FullToken.FullInstruction;
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list