[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs
tarceri at itsqueeze.com
Fri Jul 20 23:56:00 UTC 2018
On 20/07/18 14:53, Dave Airlie wrote> On 20 July 2018 at 13:12, Marek
Olšák <maraeo at gmail.com> wrote:
>> From: Marek Olšák <marek.olsak at amd.com>
>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>> the test)
> I think this is possibly a bad idea, since it's clear LLVM has some pathalogical
> behaviour the AMDGPU backend for this shader and we are just papering over it.
> A quick dig into LLVM shows horrible misuse of a SmallVector data structure
> for what ends up having 2000 entries in it.
> I'm not going to out right NAK this, but it would be nice to have it accompanied
> by a pointer to an llvm bug against the amdgpu backend for the
> pathalogical case.
Ignoring the potential for LLVM to be slow on really ugly shaders I
think the real issue with this CTS test is in GLSL IR. 2000+ ssbo store
calls are for the second buffer block:
highp mat4 mA;
highp mat4x3 mC;
highp ivec3 mD;
layout(packed, row_major, binding = 1) buffer BlockC
highp uint f;
mediump vec3 g;
highp ivec3 h;
The layout is row_major and we basically have to large arrays or matrices:
highp mat4 imA;
highp mat4x3 mC;
The code in lower_buffer_access::emit_access() will create 448 ssbos
stores for i and 2016 ssbos stores for mC because it breaks the stores
down into scalars when the layout is row_major.
If we instead store the values to temps before calling ssbo store we
should end up with one 112 and 672 (although the layout is also packed
so we could even get this down to 504) ssbos stores. Add the stores
calls should be in location order so presumably LLVM wont spend time
rearranging them either.
More information about the mesa-dev