[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Tue Jul 24 04:24:02 UTC 2018

On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
> On 24/07/18 11:15, Marek Olšák wrote:
>>
>> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <airlied at gmail.com> wrote:
>>>
>>> On 20 July 2018 at 13:12, Marek Olšák <maraeo at gmail.com> wrote:
>>>>
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>>>> the test)
>>>
>>>
>>> I think this is possibly a bad idea, since it's clear LLVM has some
>>> pathalogical
>>> behaviour the AMDGPU backend for this shader and we are just papering
>>> over it.
>>>
>>> A quick dig into LLVM shows horrible misuse of a SmallVector data
>>> structure
>>> for what ends up having 2000 entries in it.
>>>
>>> I'm not going to out right NAK this, but it would be nice to have it
>>> accompanied
>>> by a pointer to an llvm bug against the amdgpu backend for the
>>> pathalogical case.
>>
>>
>> Even if I comment out the push_back call in LLVM, it's still too slow.
>> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
>> but I don't know yet if it's enough for the test.
>
>
> I hard-coded the second buffer block to column_major rather than row_major
> which reduced total run time from 15 -> 9 seconds on my machine. So it seems
> temps would definitely help. Proper packing support would also likely help a
> little more but not as much.

Can you please describe how temps would help?

Thanks,
Marek