[Mesa-dev] [PATCH] [rfc] r600: set vpm bit for loop start clause

Fri Aug 3 05:00:03 UTC 2018

On 3 August 2018 at 14:02, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 03.08.2018 um 05:10 schrieb Dave Airlie:
>> From: Dave Airlie <airlied at redhat.com>
>>
>> This fixes some hangs with the arb_shader_image_load_store-atomicity tests
>> on evergreen/cayman GPUs.
>>
>> I'm not 100% sure why (VPM hurts my brain), I'm running some piglit
>> runs to see if it has any bad side effects.
>> ---
>>  src/gallium/drivers/r600/r600_shader.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c
>> index 2229dc8fab3..1341bf2c48c 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -10587,7 +10587,7 @@ static int tgsi_bgnloop(struct r600_shader_ctx *ctx)
>>       /* LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not
>>        * limited to 4096 iterations, like the other LOOP_* instructions. */
>>       r600_bytecode_add_cfinst(ctx->bc, CF_OP_LOOP_START_DX10);
>> -
>> +     ctx->bc->cf_last->vpm = 1;
>>       fc_pushlevel(ctx, FC_LOOP);
>>
>>       /* check stack depth */
>>
>
> I think you might get incorrect derivatives afterwards (if some pixels
> in a quad were invalid)? Or are the lanes in the alu still active (I
> don't really understand all the vpm stuff...) afterwards?

That I'm not sure about, I think you end up pushing a different mask
onto the stack for the subsequent alu operations. But I've never fully
wrapped my brain around this and retained it.

>
> I'm wondering if the tests are just bad? You can get lockups rather
> easily if you're not extra careful with atomics in loops (because the
> result of the atomic is undefined for helper invocations, and the hw
> will never store the result).
> There was recently some discussion about this on mesa-dev, also see
> https://bugs.freedesktop.org/show_bug.cgi?id=106902

I expect it's in this category, we get garbage back from the helper
invocation atomics and the loop loops forever. Setting vpm
must avoid the complete loop for helper invocations.

Maybe I can limit the vpm setting to only cases where we have
atomic operations in the loop to avoid any unexpected fallout.

Well at the moment fglrx passes the test fine and doesn't hang, it's
where I got the idea for the vpm here. Now fglrx might just be avoiding
the problem or it may have a good reason for setting vpm on the for loop,
but I can't really tell.

Dave.