[Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default

Thu Aug 24 07:45:44 UTC 2017

On 22/08/17 22:14, Timothy Arceri wrote:
> I'm a little unsure what to do with this now. Below is my shader-db
> results, the majority of negative changes are from Natural Selection
> 2.
> 
> I looked at some dumps of the worst Natural Selection 2 shaders and
> it seems to just be scheduling differences causing the regressions.
> 
> I tested with sisched but that just made things even worse.
> 
> Obviously we should be aiming to improve the schedulare, but since
> this regresses things and I have no evidence of it helping anything
> it makes the case for adding it pretty weak.
> 
> Thoughts??
> 
> PERCENTAGE DELTAS    Shaders     SGPRs     VGPRs SpillSGPR  MaxWaves
> --------------------------------------------------------------------
>   All affected            5797    2.92     3.05 %    5.04 %   -2.94
>   -------------------------------------------------------------------
>   Total                  72287    0.28 %    0.34 %    0.33 %  -0.21 %
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

As far as I can tell this is because after this chnage we end up with 
large sections of consecutive loads. Any thoughts on avoid this?

  e.g

   %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
   %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
   %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
   %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
   %238 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
   %239 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
   %240 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
   %241 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
   %242 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
   %243 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
   %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
   %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
   %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
   %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
   %248 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
   %249 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
   %250 = fmul nsz float %227, %234
   %251 = fmul nsz float %229, %235
   %252 = fadd nsz float %250, %251
   %253 = fmul nsz float %231, %236
   %254 = fadd nsz float %252, %253
   %255 = fadd nsz float %254, %237
   %256 = fmul nsz float %227, %238
   %257 = fmul nsz float %229, %239
   %258 = fadd nsz float %256, %257
   %259 = fmul nsz float %231, %240
   %260 = fadd nsz float %258, %259
   %261 = fadd nsz float %260, %241
   %262 = fmul nsz float %227, %242
   %263 = fmul nsz float %229, %243
   %264 = fadd nsz float %262, %263
   %265 = fmul nsz float %231, %244
   %266 = fadd nsz float %264, %265
   %267 = fadd nsz float %266, %245
   %268 = fmul nsz float %227, %246
   %269 = fmul nsz float %229, %247
   %270 = fadd nsz float %268, %269
   %271 = fmul nsz float %231, %248
   %272 = fadd nsz float %270, %271
   %273 = fadd nsz float %272, %249

vs

%234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
   %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
   %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
   %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
   %238 = fmul nsz float %227, %234
   %239 = fmul nsz float %229, %235
   %240 = fadd nsz float %238, %239
   %241 = fmul nsz float %231, %236
   %242 = fadd nsz float %240, %241
   %243 = fadd nsz float %242, %237
   %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
   %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
   %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
   %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
   %248 = fmul nsz float %227, %244
   %249 = fmul nsz float %229, %245
   %250 = fadd nsz float %248, %249
   %251 = fmul nsz float %231, %246
   %252 = fadd nsz float %250, %251
   %253 = fadd nsz float %252, %247
   %254 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
   %255 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
   %256 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
   %257 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
   %258 = fmul nsz float %227, %254
   %259 = fmul nsz float %229, %255
   %260 = fadd nsz float %258, %259
   %261 = fmul nsz float %231, %256
   %262 = fadd nsz float %260, %261
   %263 = fadd nsz float %262, %257
   %264 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
   %265 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
   %266 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
   %267 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
   %268 = fmul nsz float %227, %264
   %269 = fmul nsz float %229, %265
   %270 = fadd nsz float %268, %269
   %271 = fmul nsz float %231, %266
   %272 = fadd nsz float %270, %271
   %273 = fadd nsz float %272, %267