[Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default
Timothy Arceri
tarceri at itsqueeze.com
Thu Aug 24 07:45:44 UTC 2017
On 22/08/17 22:14, Timothy Arceri wrote:
> I'm a little unsure what to do with this now. Below is my shader-db
> results, the majority of negative changes are from Natural Selection
> 2.
>
> I looked at some dumps of the worst Natural Selection 2 shaders and
> it seems to just be scheduling differences causing the regressions.
>
> I tested with sisched but that just made things even worse.
>
> Obviously we should be aiming to improve the schedulare, but since
> this regresses things and I have no evidence of it helping anything
> it makes the case for adding it pretty weak.
>
> Thoughts??
>
> PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR MaxWaves
> --------------------------------------------------------------------
> All affected 5797 2.92 3.05 % 5.04 % -2.94
> -------------------------------------------------------------------
> Total 72287 0.28 % 0.34 % 0.33 % -0.21 %
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
As far as I can tell this is because after this chnage we end up with
large sections of consecutive loads. Any thoughts on avoid this?
e.g
%234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
%235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
%236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
%237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
%238 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
%239 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
%240 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
%241 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
%242 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
%243 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
%244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
%245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
%246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
%247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
%248 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
%249 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
%250 = fmul nsz float %227, %234
%251 = fmul nsz float %229, %235
%252 = fadd nsz float %250, %251
%253 = fmul nsz float %231, %236
%254 = fadd nsz float %252, %253
%255 = fadd nsz float %254, %237
%256 = fmul nsz float %227, %238
%257 = fmul nsz float %229, %239
%258 = fadd nsz float %256, %257
%259 = fmul nsz float %231, %240
%260 = fadd nsz float %258, %259
%261 = fadd nsz float %260, %241
%262 = fmul nsz float %227, %242
%263 = fmul nsz float %229, %243
%264 = fadd nsz float %262, %263
%265 = fmul nsz float %231, %244
%266 = fadd nsz float %264, %265
%267 = fadd nsz float %266, %245
%268 = fmul nsz float %227, %246
%269 = fmul nsz float %229, %247
%270 = fadd nsz float %268, %269
%271 = fmul nsz float %231, %248
%272 = fadd nsz float %270, %271
%273 = fadd nsz float %272, %249
vs
%234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
%235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
%236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
%237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
%238 = fmul nsz float %227, %234
%239 = fmul nsz float %229, %235
%240 = fadd nsz float %238, %239
%241 = fmul nsz float %231, %236
%242 = fadd nsz float %240, %241
%243 = fadd nsz float %242, %237
%244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
%245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
%246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
%247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
%248 = fmul nsz float %227, %244
%249 = fmul nsz float %229, %245
%250 = fadd nsz float %248, %249
%251 = fmul nsz float %231, %246
%252 = fadd nsz float %250, %251
%253 = fadd nsz float %252, %247
%254 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
%255 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
%256 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
%257 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
%258 = fmul nsz float %227, %254
%259 = fmul nsz float %229, %255
%260 = fadd nsz float %258, %259
%261 = fmul nsz float %231, %256
%262 = fadd nsz float %260, %261
%263 = fadd nsz float %262, %257
%264 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
%265 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
%266 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
%267 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
%268 = fmul nsz float %227, %264
%269 = fmul nsz float %229, %265
%270 = fadd nsz float %268, %269
%271 = fmul nsz float %231, %266
%272 = fadd nsz float %270, %271
%273 = fadd nsz float %272, %267
More information about the mesa-dev
mailing list