[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Wed Aug 1 18:42:19 UTC 2018

On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
> On 24/07/18 11:15, Marek Olšák wrote:
>>
>> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <airlied at gmail.com> wrote:
>>>
>>> On 20 July 2018 at 13:12, Marek Olšák <maraeo at gmail.com> wrote:
>>>>
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>>>> the test)
>>>
>>>
>>> I think this is possibly a bad idea, since it's clear LLVM has some
>>> pathalogical
>>> behaviour the AMDGPU backend for this shader and we are just papering
>>> over it.
>>>
>>> A quick dig into LLVM shows horrible misuse of a SmallVector data
>>> structure
>>> for what ends up having 2000 entries in it.
>>>
>>> I'm not going to out right NAK this, but it would be nice to have it
>>> accompanied
>>> by a pointer to an llvm bug against the amdgpu backend for the
>>> pathalogical case.
>>
>>
>> Even if I comment out the push_back call in LLVM, it's still too slow.
>> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
>> but I don't know yet if it's enough for the test.
>
>
> I hard-coded the second buffer block to column_major rather than row_major
> which reduced total run time from 15 -> 9 seconds on my machine. So it seems
> temps would definitely help. Proper packing support would also likely help a
> little more but not as much.

15 -> 9 is not enough. We need to decrease the compile time by 60% or more.

For Dave: Commenting out the "push_back" call in LLVM is also not enough.

Only LLVMCodeGenLevelLess gives the desired improvement (~60%), though
the test is dangerously close to timing out and getting killed.
LLVMCodeGenLevelNone is fastest, but the bytecode is horrible (live
variables between blocks are always spilled).

If there is no straightforward way to improve compile times (I think
there isn't), I'll have to push this.

Marek