[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Tue Jul 24 03:33:39 UTC 2018

On 24/07/18 11:15, Marek Olšák wrote:
> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <airlied at gmail.com> wrote:
>> On 20 July 2018 at 13:12, Marek Olšák <maraeo at gmail.com> wrote:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>>> the test)
>>
>> I think this is possibly a bad idea, since it's clear LLVM has some pathalogical
>> behaviour the AMDGPU backend for this shader and we are just papering over it.
>>
>> A quick dig into LLVM shows horrible misuse of a SmallVector data structure
>> for what ends up having 2000 entries in it.
>>
>> I'm not going to out right NAK this, but it would be nice to have it accompanied
>> by a pointer to an llvm bug against the amdgpu backend for the
>> pathalogical case.
> 
> Even if I comment out the push_back call in LLVM, it's still too slow.
> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
> but I don't know yet if it's enough for the test.

I hard-coded the second buffer block to column_major rather than 
row_major which reduced total run time from 15 -> 9 seconds on my 
machine. So it seems temps would definitely help. Proper packing support 
would also likely help a little more but not as much.