[Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default

Nicolai Hähnle nhaehnle at gmail.com
Thu Aug 24 08:12:56 UTC 2017


On 24.08.2017 09:45, Timothy Arceri wrote:
> 
> 
> On 22/08/17 22:14, Timothy Arceri wrote:
>> I'm a little unsure what to do with this now. Below is my shader-db
>> results, the majority of negative changes are from Natural Selection
>> 2.
>>
>> I looked at some dumps of the worst Natural Selection 2 shaders and
>> it seems to just be scheduling differences causing the regressions.
>>
>> I tested with sisched but that just made things even worse.
>>
>> Obviously we should be aiming to improve the schedulare, but since
>> this regresses things and I have no evidence of it helping anything
>> it makes the case for adding it pretty weak.
>>
>> Thoughts??
>>
>> PERCENTAGE DELTAS    Shaders     SGPRs     VGPRs SpillSGPR  MaxWaves
>> --------------------------------------------------------------------
>>   All affected            5797    2.92     3.05 %    5.04 %   -2.94
>>   -------------------------------------------------------------------
>>   Total                  72287    0.28 %    0.34 %    0.33 %  -0.21 %
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> 
> 
> As far as I can tell this is because after this chnage we end up with 
> large sections of consecutive loads. Any thoughts on avoid this?

Odd. Do you see the same change in TGSI?

This is one of those things that ideally LLVM would be smart about, but 
unfortunately it isn't really.

Cheers,
Nicolai

> 
>   e.g
> 
>    %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
>    %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
>    %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
>    %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
>    %238 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
>    %239 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
>    %240 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
>    %241 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
>    %242 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
>    %243 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
>    %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
>    %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
>    %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
>    %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
>    %248 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
>    %249 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
>    %250 = fmul nsz float %227, %234
>    %251 = fmul nsz float %229, %235
>    %252 = fadd nsz float %250, %251
>    %253 = fmul nsz float %231, %236
>    %254 = fadd nsz float %252, %253
>    %255 = fadd nsz float %254, %237
>    %256 = fmul nsz float %227, %238
>    %257 = fmul nsz float %229, %239
>    %258 = fadd nsz float %256, %257
>    %259 = fmul nsz float %231, %240
>    %260 = fadd nsz float %258, %259
>    %261 = fadd nsz float %260, %241
>    %262 = fmul nsz float %227, %242
>    %263 = fmul nsz float %229, %243
>    %264 = fadd nsz float %262, %263
>    %265 = fmul nsz float %231, %244
>    %266 = fadd nsz float %264, %265
>    %267 = fadd nsz float %266, %245
>    %268 = fmul nsz float %227, %246
>    %269 = fmul nsz float %229, %247
>    %270 = fadd nsz float %268, %269
>    %271 = fmul nsz float %231, %248
>    %272 = fadd nsz float %270, %271
>    %273 = fadd nsz float %272, %249
> 
> 
> vs
> 
> 
> %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
>    %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
>    %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
>    %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
>    %238 = fmul nsz float %227, %234
>    %239 = fmul nsz float %229, %235
>    %240 = fadd nsz float %238, %239
>    %241 = fmul nsz float %231, %236
>    %242 = fadd nsz float %240, %241
>    %243 = fadd nsz float %242, %237
>    %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
>    %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
>    %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
>    %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
>    %248 = fmul nsz float %227, %244
>    %249 = fmul nsz float %229, %245
>    %250 = fadd nsz float %248, %249
>    %251 = fmul nsz float %231, %246
>    %252 = fadd nsz float %250, %251
>    %253 = fadd nsz float %252, %247
>    %254 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
>    %255 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
>    %256 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
>    %257 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
>    %258 = fmul nsz float %227, %254
>    %259 = fmul nsz float %229, %255
>    %260 = fadd nsz float %258, %259
>    %261 = fmul nsz float %231, %256
>    %262 = fadd nsz float %260, %261
>    %263 = fadd nsz float %262, %257
>    %264 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
>    %265 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
>    %266 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
>    %267 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
>    %268 = fmul nsz float %227, %264
>    %269 = fmul nsz float %229, %265
>    %270 = fadd nsz float %268, %269
>    %271 = fmul nsz float %231, %266
>    %272 = fadd nsz float %270, %271
>    %273 = fadd nsz float %272, %267
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list