[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Fri Jul 20 23:56:00 UTC 2018

On 20/07/18 14:53, Dave Airlie wrote> On 20 July 2018 at 13:12, Marek 
Olšák <maraeo at gmail.com> wrote:
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>> the test)
> 
> I think this is possibly a bad idea, since it's clear LLVM has some pathalogical
> behaviour the AMDGPU backend for this shader and we are just papering over it.
> 
> A quick dig into LLVM shows horrible misuse of a SmallVector data structure
> for what ends up having 2000 entries in it.
> 
> I'm not going to out right NAK this, but it would be nice to have it accompanied
> by a pointer to an llvm bug against the amdgpu backend for the
> pathalogical case.
> 

Ignoring the potential for LLVM to be slow on really ugly shaders I 
think the real issue with this CTS test is in GLSL IR. 2000+ ssbo store 
calls are for the second buffer block:

struct sA
{
	highp mat4 mA;
	bvec2 mB;
	highp mat4x3 mC[6];
	highp ivec3 mD;
};

layout(packed, row_major, binding = 1) buffer BlockC
{
	highp uint f;
	mediump vec3 g;
	highp ivec3 h[5];
	sA i[4][7];
};

The layout is row_major and we basically have to large arrays or matrices:

	highp mat4 i[4][7]mA;
	highp mat4x3 mC[4][7][6];

The code in lower_buffer_access::emit_access() will create 448 ssbos 
stores for i and 2016 ssbos stores for mC because it breaks the stores 
down into scalars when the layout is row_major.

If we instead store the values to temps before calling ssbo store we 
should end up with one 112 and 672 (although the layout is also packed 
so we could even get this down to 504) ssbos stores. Add the stores 
calls should be in location order so presumably LLVM wont spend time 
rearranging them either.