<div dir="auto">Sounds like a major project for someone to fix llvm, doesn't AMD have compiled devs?<div dir="auto"><br></div><div dir="auto">Acked-by: Dave Airlie <<a href="mailto:airlied@gmail.com">airlied@gmail.com</a>></div><div dir="auto"><br></div><div dir="auto">Dave.</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu., 2 Aug. 2018, 04:43 Marek Olšák, <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri <<a href="mailto:tarceri@itsqueeze.com" target="_blank" rel="noreferrer">tarceri@itsqueeze.com</a>> wrote:<br> > On 24/07/18 11:15, Marek Olšák wrote:<br> >><br> >> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie <<a href="mailto:airlied@gmail.com" target="_blank" rel="noreferrer">airlied@gmail.com</a>> wrote:<br> >>><br> >>> On 20 July 2018 at 13:12, Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank" rel="noreferrer">maraeo@gmail.com</a>> wrote:<br> >>>><br> >>>> From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank" rel="noreferrer">marek.olsak@amd.com</a>><br> >>>><br> >>>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23<br> >>>> finish sooner on the older CPUs. (otherwise it gets killed and we fail<br> >>>> the test)<br> >>><br> >>><br> >>> I think this is possibly a bad idea, since it's clear LLVM has some<br> >>> pathalogical<br> >>> behaviour the AMDGPU backend for this shader and we are just papering<br> >>> over it.<br> >>><br> >>> A quick dig into LLVM shows horrible misuse of a SmallVector data<br> >>> structure<br> >>> for what ends up having 2000 entries in it.<br> >>><br> >>> I'm not going to out right NAK this, but it would be nice to have it<br> >>> accompanied<br> >>> by a pointer to an llvm bug against the amdgpu backend for the<br> >>> pathalogical case.<br> >><br> >><br> >> Even if I comment out the push_back call in LLVM, it's still too slow.<br> >> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,<br> >> but I don't know yet if it's enough for the test.<br> ><br> ><br> > I hard-coded the second buffer block to column_major rather than row_major<br> > which reduced total run time from 15 -> 9 seconds on my machine. So it seems<br> > temps would definitely help. Proper packing support would also likely help a<br> > little more but not as much.<br> <br> 15 -> 9 is not enough. We need to decrease the compile time by 60% or more.<br> <br> For Dave: Commenting out the "push_back" call in LLVM is also not enough.<br> <br> Only LLVMCodeGenLevelLess gives the desired improvement (~60%), though<br> the test is dangerously close to timing out and getting killed.<br> LLVMCodeGenLevelNone is fastest, but the bytecode is horrible (live<br> variables between blocks are always spilled).<br> <br> If there is no straightforward way to improve compile times (I think<br> there isn't), I'll have to push this.<br> <br> Marek<br> </blockquote></div>