[Mesa-dev] [PATCH] [rfc] r600: set vpm bit for loop start clause

Mon Aug 6 15:00:27 UTC 2018

Am 05.08.2018 um 11:47 schrieb Gert Wollny:
> Am Freitag, den 03.08.2018, 06:02 +0200 schrieb Roland Scheidegger:
>> Am 03.08.2018 um 05:10 schrieb Dave Airlie:
>>> From: Dave Airlie <airlied at redhat.com>
>>>
>>> This fixes some hangs with the arb_shader_image_load_store-
>>> atomicity tests
>>> on evergreen/cayman GPUs.
>>>
>>> I'm not 100% sure why (VPM hurts my brain), I'm running some piglit
>>> runs to see if it has any bad side effects.
>>> ---
>>>  src/gallium/drivers/r600/r600_shader.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>>> b/src/gallium/drivers/r600/r600_shader.c
>>> index 2229dc8fab3..1341bf2c48c 100644
>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>> @@ -10587,7 +10587,7 @@ static int tgsi_bgnloop(struct
>>> r600_shader_ctx *ctx)
>>>  	/* LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so
>>> it is not
>>>  	 * limited to 4096 iterations, like the other LOOP_*
>>> instructions. */
>>>  	r600_bytecode_add_cfinst(ctx->bc, CF_OP_LOOP_START_DX10);
>>> -
>>> +	ctx->bc->cf_last->vpm = 1;
>>>  	fc_pushlevel(ctx, FC_LOOP);
>>>  
>>>  	/* check stack depth */
>>>
>>
>> I think you might get incorrect derivatives afterwards (if some
>> pixels in a quad were invalid)? Or are the lanes in the alu still
>> active (I
>> don't really understand all the vpm stuff...) afterwards?
>>
>> I'm wondering if the tests are just bad? You can get lockups rather
>> easily if you're not extra careful with atomics in loops (because the
>> result of the atomic is undefined for helper invocations, and the hw
>> will never store the result).
>> There was recently some discussion about this on mesa-dev, also see
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D106902&data=02%7C01%7Csroland%40vmware.com%7Cd5d0b8ddaa4d41773dc408d5fab881e6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636690592726947263&sdata=7QJv4h8OUfzFaJvTjdAUgUr2WbdBu85SC6Z2h%2FZSCFw%3D&reserved=0
> 
> I might add that I've also played around to get rid of these locks, at
> least to be able to run the gpu test set completely, so I did one
> thing, I limited the number of iterations by using an extra counter,
> and with that this particular test passed, which means that some
> instance of the shader is really running the loop with bogous data.
> This patch fixed the unmodified piglit for me.
> 
> Another piglit, 
> 
> arb_shader_storage_buffer_object/
>    execution/ssbo-atomicCompSwap-int.shader_test
> 
> still hangs for some time (on barts), but it has an extra protection
> against helper invocations. This test also fails when I put in such a
> savety counter to break the loop after 1000 iterations,  so I guess the
> failure is geniue, but failing this test should not result in a GPU
> lockup. I guess tests that might lockup a GPU are needed to be able to
> debug problems, but IMHO they should not be part of some of these 
> general test sets.
> 
> Because of these lockups I have no data to compare older runs. With
> this patch and the ssbo test fixed up I got 
> 
> [26236/26236] skip: 10688, pass: 15118, warn: 13, fail: 412, crash: 5
> 
> for the full gpu set on BARTS in single thread mode, on CEDAR I got
> some other, unrelated  GPU lockups.
> 
> Looking only at the texture piglits that failed, there were no
> regressions with respect to this patch. 
My guess would be that there's possibly no texturing tests (using
implicit derivatives) which would test texturing after (or in) loops, so
no regressions wouldn't necessarily mean much, but it's definitely
something which could happen in practice.
(Writing such a test if there isn't any shouldn't be too difficult.)

Roland

> 
> When running piglit multithreaded I still get lockups that bring down
> the graphics card so that it was no longer fully usable and a cold
> restart is needed. In these cases various piglits that run tests with
> atomics are running in parallel, and notable the process running
> arb_compute_shader-local-id could not be killed a all. 
> 
> In any case, this patch looks like an enhacement to me, because it
> makes lockups less likely, and I didn't see regressions, so 
>   Tested-By: Gert Wollny <gw.fossdev at gmail.com>
> 
> best,
> Gert
>