[Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs

Marek Olšák maraeo at gmail.com
Tue Sep 13 18:28:30 UTC 2016


On Tue, Sep 13, 2016 at 8:16 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>
> On 13.09.2016 19:13, Marek Olšák wrote:
>>
>> This is quite easy because we just have to get rid of all of
>> the preloading at the beginning of shaders.
>>
>> I also removed preloading of PS inputs with literal indexing, which
>> has almost the same effect as sinking interp instructions.
>>
>> I'm slightly concerned that LICM won't move interps because they are
>> not considered speculatively-executable (=movable) by LLVM, but
>> the shader-db stats show that it doesn't matter.
>>
>> LLVM is smart enough to do CSE where needed for both descriptor loads
>> and interps. In fact, it's the CSE which is responsible for some of
>> the remaining SGPR spills. (It makes sense if you think about it)
>>
>> The compile time increased by 6% because CSE has a lot more work,
>> but it's certainly worth it.
>>
>>
>> shader-db stats:
>>
>> [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor
>>     https://people.freedesktop.org/~mareko/no_preload1.html
>> [PATCH 5/6] radeonsi: get rid of constant buffer preloading
>>     https://people.freedesktop.org/~mareko/no_preload2.html
>> [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each
>>     https://people.freedesktop.org/~mareko/no_preload3_ps.html
>>
>> Total diff:
>>     https://people.freedesktop.org/~mareko/no_preload_total.html
>
>
> Those numbers are impressive.
>
> We do have to be slightly careful, I noticed that LLVM didn't lift some constant loads out of loops with the earlier preload removal, in shaders where SGPR pressure wasn't an issue at all.
>
> I think the right way to deal with this is to improve heuristics in LLVM, so I'm fine with changing Mesa in this way.


Yeah. The problem is the LICM (moving stuff out of loops) and Sink
(moving stuff forward) passes are no-ops with intrinsics, because
intrinsics fail the "isSafeToSpeculativelyExecute" function. The
trivial fix would be to add a new "movable" flag for intrinsics and
process it in that function.

Marek


More information about the mesa-dev mailing list