[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Tue Jul 24 05:00:42 UTC 2018

On 24/07/18 14:24, Marek Olšák wrote:
> On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
>> On 24/07/18 11:15, Marek Olšák wrote:
>>>
>>> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <airlied at gmail.com> wrote:
>>>>
>>>> On 20 July 2018 at 13:12, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>
>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>
>>>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>>>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>>>>> the test)
>>>>
>>>>
>>>> I think this is possibly a bad idea, since it's clear LLVM has some
>>>> pathalogical
>>>> behaviour the AMDGPU backend for this shader and we are just papering
>>>> over it.
>>>>
>>>> A quick dig into LLVM shows horrible misuse of a SmallVector data
>>>> structure
>>>> for what ends up having 2000 entries in it.
>>>>
>>>> I'm not going to out right NAK this, but it would be nice to have it
>>>> accompanied
>>>> by a pointer to an llvm bug against the amdgpu backend for the
>>>> pathalogical case.
>>>
>>>
>>> Even if I comment out the push_back call in LLVM, it's still too slow.
>>> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
>>> but I don't know yet if it's enough for the test.
>>
>>
>> I hard-coded the second buffer block to column_major rather than row_major
>> which reduced total run time from 15 -> 9 seconds on my machine. So it seems
>> temps would definitely help. Proper packing support would also likely help a
>> little more but not as much.
> 
> Can you please describe how temps would help?

I already have :)

https://lists.freedesktop.org/archives/mesa-dev/2018-July/200710.html

> 
> Thanks,
> Marek
>