[Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default

Timothy Arceri tarceri at itsqueeze.com
Thu Aug 24 09:48:06 UTC 2017



On 24/08/17 18:12, Nicolai Hähnle wrote:
> On 24.08.2017 09:45, Timothy Arceri wrote:
>>
>>
>> On 22/08/17 22:14, Timothy Arceri wrote:
>>> I'm a little unsure what to do with this now. Below is my shader-db
>>> results, the majority of negative changes are from Natural Selection
>>> 2.
>>>
>>> I looked at some dumps of the worst Natural Selection 2 shaders and
>>> it seems to just be scheduling differences causing the regressions.
>>>
>>> I tested with sisched but that just made things even worse.
>>>
>>> Obviously we should be aiming to improve the schedulare, but since
>>> this regresses things and I have no evidence of it helping anything
>>> it makes the case for adding it pretty weak.
>>>
>>> Thoughts??
>>>
>>> PERCENTAGE DELTAS    Shaders     SGPRs     VGPRs SpillSGPR  MaxWaves
>>> --------------------------------------------------------------------
>>>   All affected            5797    2.92     3.05 %    5.04 %   -2.94
>>>   -------------------------------------------------------------------
>>>   Total                  72287    0.28 %    0.34 %    0.33 %  -0.21 %
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
>>
>> As far as I can tell this is because after this chnage we end up with 
>> large sections of consecutive loads. Any thoughts on avoid this?
> 
> Odd. Do you see the same change in TGSI?
> 
> This is one of those things that ideally LLVM would be smart about, but 
> unfortunately it isn't really.

Yeah I assume it's very doable since SSA makes this stuff reasonably 
easy to deal with. However I'm not really sure where to begin, or how 
welcome a pass to do this sorting would be. We have a similar pass in 
nir for moving comparisons to where they are first used.

The TGSI is introduces an extra temp to store the value of the LOAD, 
this is probably what triggers the difference in LLVM.

eg.

  LOAD TEMP[61], UBO[2], IMM[2].yyyy
  LOAD TEMP[62], UBO[2], IMM[1].zzzz
  LOAD TEMP[63], UBO[2], IMM[1].wwww
  LOAD TEMP[64], UBO[2], IMM[2].xxxx
  DP4 TEMP[65].x, TEMP[60], TEMP[61]
  DP4 TEMP[66].x, TEMP[60], TEMP[62]
  MOV TEMP[65].y, TEMP[66].xxxx
  DP4 TEMP[67].x, TEMP[60], TEMP[63]
  MOV TEMP[65].z, TEMP[67].xxxx
  DP4 TEMP[68].x, TEMP[60], TEMP[64]
  MOV TEMP[69].w, TEMP[68].xxxx
  MOV TEMP[69].xyz, TEMP[65].xyzx
  LOAD TEMP[70], UBO[1], IMM[6].yyyy
  LOAD TEMP[71], UBO[1], IMM[6].zzzz
  DP4 TEMP[72].x, TEMP[69], TEMP[70]
  DP4 TEMP[73].x, TEMP[69], TEMP[71]
  LOAD TEMP[74], UBO[1], IMM[6].wwww
  LOAD TEMP[75], UBO[1], IMM[7].xxxx
  LOAD TEMP[76], UBO[1], IMM[7].yyyy
  LOAD TEMP[77], UBO[1], IMM[7].zzzz
  DP4 TEMP[78].x, TEMP[69], TEMP[74]
  DP4 TEMP[79].x, TEMP[69], TEMP[75]
  MOV TEMP[78].y, TEMP[79].xxxx
  DP4 TEMP[80].x, TEMP[69], TEMP[76]
  MOV TEMP[78].z, TEMP[80].xxxx
  DP4 TEMP[81].x, TEMP[69], TEMP[77]
  MOV TEMP[78].w, TEMP[81].xxxx

vs

  DP4 TEMP[63].x, TEMP[62], CONST[2][0]
  DP4 TEMP[64].x, TEMP[62], CONST[2][1]
  MOV TEMP[63].y, TEMP[64].xxxx
  DP4 TEMP[65].x, TEMP[62], CONST[2][2]
  MOV TEMP[63].z, TEMP[65].xxxx
  DP4 TEMP[66].x, TEMP[62], CONST[2][3]
  MOV TEMP[67].w, TEMP[66].xxxx
  MOV TEMP[67].xyz, TEMP[63].xyzx
  DP4 TEMP[68].x, TEMP[67], CONST[1][14]
  DP4 TEMP[69].x, TEMP[67], CONST[1][15]
  DP4 TEMP[70].x, TEMP[67], CONST[1][8]
  DP4 TEMP[71].x, TEMP[67], CONST[1][9]
  MOV TEMP[70].y, TEMP[71].xxxx
  DP4 TEMP[72].x, TEMP[67], CONST[1][10]
  MOV TEMP[70].z, TEMP[72].xxxx
  DP4 TEMP[73].x, TEMP[67], CONST[1][11]
  MOV TEMP[70].w, TEMP[73].xxxx
  MOV TEMP[74].xyw, TEMP[70].xyxw

> 
> Cheers,
> Nicolai
> 


More information about the mesa-dev mailing list