[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

Wed Aug 1 19:24:23 UTC 2018

On Wed, Aug 1, 2018 at 3:20 PM, Dave Airlie <airlied at gmail.com> wrote:
> Sounds like a major project for someone to fix llvm, doesn't AMD have
> compiled devs?

Yes, but they are from entirely different teams.

Marek

>
> Acked-by: Dave Airlie <airlied at gmail.com>
>
> Dave.
>
> On Thu., 2 Aug. 2018, 04:43 Marek Olšák, <maraeo at gmail.com> wrote:
>>
>> On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <tarceri at itsqueeze.com>
>> wrote:
>> > On 24/07/18 11:15, Marek Olšák wrote:
>> >>
>> >> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <airlied at gmail.com>
>> >> wrote:
>> >>>
>> >>> On 20 July 2018 at 13:12, Marek Olšák <maraeo at gmail.com> wrote:
>> >>>>
>> >>>> From: Marek Olšák <marek.olsak at amd.com>
>> >>>>
>> >>>> To make
>> >>>> dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>> >>>> finish sooner on the older CPUs. (otherwise it gets killed and we
>> >>>> fail
>> >>>> the test)
>> >>>
>> >>>
>> >>> I think this is possibly a bad idea, since it's clear LLVM has some
>> >>> pathalogical
>> >>> behaviour the AMDGPU backend for this shader and we are just papering
>> >>> over it.
>> >>>
>> >>> A quick dig into LLVM shows horrible misuse of a SmallVector data
>> >>> structure
>> >>> for what ends up having 2000 entries in it.
>> >>>
>> >>> I'm not going to out right NAK this, but it would be nice to have it
>> >>> accompanied
>> >>> by a pointer to an llvm bug against the amdgpu backend for the
>> >>> pathalogical case.
>> >>
>> >>
>> >> Even if I comment out the push_back call in LLVM, it's still too slow.
>> >> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
>> >> but I don't know yet if it's enough for the test.
>> >
>> >
>> > I hard-coded the second buffer block to column_major rather than
>> > row_major
>> > which reduced total run time from 15 -> 9 seconds on my machine. So it
>> > seems
>> > temps would definitely help. Proper packing support would also likely
>> > help a
>> > little more but not as much.
>>
>> 15 -> 9 is not enough. We need to decrease the compile time by 60% or
>> more.
>>
>> For Dave: Commenting out the "push_back" call in LLVM is also not enough.
>>
>> Only LLVMCodeGenLevelLess gives the desired improvement (~60%), though
>> the test is dangerously close to timing out and getting killed.
>> LLVMCodeGenLevelNone is fastest, but the bytecode is horrible (live
>> variables between blocks are always spilled).
>>
>> If there is no straightforward way to improve compile times (I think
>> there isn't), I'll have to push this.
>>
>> Marek