[Mesa-dev] [PATCH] [rfc] r600: set vpm bit for loop start clause

Sun Aug 5 09:47:12 UTC 2018

Am Freitag, den 03.08.2018, 06:02 +0200 schrieb Roland Scheidegger:
> Am 03.08.2018 um 05:10 schrieb Dave Airlie:
> > From: Dave Airlie <airlied at redhat.com>
> > 
> > This fixes some hangs with the arb_shader_image_load_store-
> > atomicity tests
> > on evergreen/cayman GPUs.
> > 
> > I'm not 100% sure why (VPM hurts my brain), I'm running some piglit
> > runs to see if it has any bad side effects.
> > ---
> >  src/gallium/drivers/r600/r600_shader.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/gallium/drivers/r600/r600_shader.c
> > b/src/gallium/drivers/r600/r600_shader.c
> > index 2229dc8fab3..1341bf2c48c 100644
> > --- a/src/gallium/drivers/r600/r600_shader.c
> > +++ b/src/gallium/drivers/r600/r600_shader.c
> > @@ -10587,7 +10587,7 @@ static int tgsi_bgnloop(struct
> > r600_shader_ctx *ctx)
> >  	/* LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so
> > it is not
> >  	 * limited to 4096 iterations, like the other LOOP_*
> > instructions. */
> >  	r600_bytecode_add_cfinst(ctx->bc, CF_OP_LOOP_START_DX10);
> > -
> > +	ctx->bc->cf_last->vpm = 1;
> >  	fc_pushlevel(ctx, FC_LOOP);
> >  
> >  	/* check stack depth */
> > 
> 
> I think you might get incorrect derivatives afterwards (if some
> pixels in a quad were invalid)? Or are the lanes in the alu still
> active (I
> don't really understand all the vpm stuff...) afterwards?
> 
> I'm wondering if the tests are just bad? You can get lockups rather
> easily if you're not extra careful with atomics in loops (because the
> result of the atomic is undefined for helper invocations, and the hw
> will never store the result).
> There was recently some discussion about this on mesa-dev, also see
> https://bugs.freedesktop.org/show_bug.cgi?id=106902

I might add that I've also played around to get rid of these locks, at
least to be able to run the gpu test set completely, so I did one
thing, I limited the number of iterations by using an extra counter,
and with that this particular test passed, which means that some
instance of the shader is really running the loop with bogous data.
This patch fixed the unmodified piglit for me.

Another piglit, 

arb_shader_storage_buffer_object/
   execution/ssbo-atomicCompSwap-int.shader_test

still hangs for some time (on barts), but it has an extra protection
against helper invocations. This test also fails when I put in such a
savety counter to break the loop after 1000 iterations,  so I guess the
failure is geniue, but failing this test should not result in a GPU
lockup. I guess tests that might lockup a GPU are needed to be able to
debug problems, but IMHO they should not be part of some of these 
general test sets.

Because of these lockups I have no data to compare older runs. With
this patch and the ssbo test fixed up I got 

[26236/26236] skip: 10688, pass: 15118, warn: 13, fail: 412, crash: 5

for the full gpu set on BARTS in single thread mode, on CEDAR I got
some other, unrelated  GPU lockups.

Looking only at the texture piglits that failed, there were no
regressions with respect to this patch. 

When running piglit multithreaded I still get lockups that bring down
the graphics card so that it was no longer fully usable and a cold
restart is needed. In these cases various piglits that run tests with
atomics are running in parallel, and notable the process running
arb_compute_shader-local-id could not be killed a all. 

In any case, this patch looks like an enhacement to me, because it
makes lockups less likely, and I didn't see regressions, so 
  Tested-By: Gert Wollny <gw.fossdev at gmail.com>

best,
Gert