[Mesa-dev] r600/sb loop issue

Dave Airlie airlied at gmail.com
Mon Dec 8 16:25:26 PST 2014


On 8 December 2014 at 20:41, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> On 12/06/2014 07:13 AM, Vadim Girlin wrote:
>>
>> On 12/04/2014 01:43 AM, Dave Airlie wrote:
>>>
>>> Hi Vadim,
>>>
>>> I've been looking with Glenn's help into a bug in sb for a couple of
>>> weeks now triggered by a change in how GLSL generates switch
>>> statements.
>>>
>>> I understand you probably aren't too interested in r600g but I believe
>>> I'm hitting a design level problem and I would like some advice.
>>>
>>> So it appears that GLSL can create loops that don't repeat for switch
>>> statements, and it appears SB wasn't ready to handle such a thing.
>>
>>
>> Hi, Dave,
>>
>> I suspect we should rather get rid of such loops somehow, i.e. convert
>> to something else, the loop that never repeats is not really a loop
>> anyway. AFAICS "continue" is not supported in switch statements
>> according to GLSL specs, so the loops generated for switch will never be
>> repeated. Am I missing something? Even if repeating is possible somehow,
>> at least we can get rid of the loops that are not repeated.
>>
>> I think loops are less efficient than other control flow instructions on
>> r600g hw (at least because they increase stack usage), and possibly on
>> other hw too.
>>
>> In fact it seems sb basically gets rid of it already in IR, it just
>> doesn't know how to translate resulting control flow to ISA, because so
>> far it only supports specific control flow structure for if-then-else
>> that was previously preserved during optimizations. I think it may be
>> not very hard to implement support for that in finalizer, I'll look into
>> it.
>
>
> In fact handling that control flow in finalizer is not as easy as I hoped,
> probably impossible, at least if we want to make it efficient. I forgot
> about the limitations of R600 ISA.
>
> OTOH it seems I've managed to fix the issues with loops, the patch is
> attached (it's meant to be used instead of 7b0067d2). There are no piglit
> regressions on evergreen, but I didn't test any real apps.
>

This fixes one thing, but the switches are still broken here on cayman at least

tests/spec/glsl-1.30/execution/switch/fs-default_last.shader_test

--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0..2], LOCAL
IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
IMM[1] UINT32 {0, 4294967295, 0, 0}
IMM[2] INT32 {1, 0, 0, 0}
  0: MOV TEMP[0], IMM[0].xxxx
  1: MOV TEMP[1].x, IMM[1].xxxx
  2: BGNLOOP :0
  3:   UCMP TEMP[1].x, CONST[0].xxxx, TEMP[1].xxxx, IMM[1].yyyy
  4:   UIF TEMP[1].xxxx :0
  5:     MOV TEMP[0].x, IMM[0].yyyy
  6:     BRK
  7:   ENDIF
  8:   USEQ TEMP[2].x, IMM[2].xxxx, CONST[0].xxxx
  9:   UCMP TEMP[1].x, TEMP[2].xxxx, IMM[1].yyyy, TEMP[1].xxxx
 10:   UIF TEMP[1].xxxx :0
 11:     MOV TEMP[0].y, IMM[0].yyyy
 12:     BRK
 13:   ENDIF
 14:   MOV TEMP[1].x, IMM[1].yyyy
 15:   MOV TEMP[0].z, IMM[0].yyyy
 16:   BRK
 17: ENDLOOP :0
 18: MOV OUT[0], TEMP[0]
 19: END

===== SHADER #13 ======================================== PS/CAYMAN/CAYMAN =====
===== 72 dw ===== 6 gprs ===== 2 stack =========================================
0000  00000012 a0100000 ALU 5 @36
 0036  000000f8 00200c90     1      x: MOV                R1.x,  0
 0038  000000f8 20200c90            y: MOV                R1.y,  0
 0040  000000f8 40200c90            z: MOV                R1.z,  0
 0042  800000f8 60200c90            w: MOV                R1.w,  0
 0044  800000f8 00400c90     2      x: MOV                R2.x,  0
0002  0000000f 81800000 LOOP_START_DX10 @30
0004  40000017 a4040000 ALU_PUSH_BEFORE 2 @46 KC0[CB0:0-15]
 0046  809f6080 0043c002     3      x: CNDGE_INT          R2.x,
KC0[0].x, -1, R2.x
 0048  801f00fe 00a0229c     4 MP   x: PRED_SETNE_INT     R5.x,  PV.x, 0
0006  00000007 82800001 JUMP @14 POP:1
0008  00000019 a0000000 ALU 1 @50
 0050  800004f9 00200c90     5      x: MOV                R1.x,  1.0
0010  0000000e 82400000 LOOP_BREAK @28
0012  00000007 83800001 POP @14 POP:1
0014  4000001a a4080000 ALU_PUSH_BEFORE 3 @52 KC0[CB0:0-15]
 0052  801000fa 00601d10     6      x: SETE_INT           R3.x,  1, KC0[0].x
 0054  800040fe 0043c4fb     7      x: CNDGE_INT          R2.x,  PV.x, R2.x, -1
 0056  801f00fe 00a0229c     8 MP   x: PRED_SETNE_INT     R5.x,  PV.x, 0
0016  0000000c 82800001 JUMP @24 POP:1
0018  0000001d a0000000 ALU 1 @58
 0058  800004f9 20200c90     9      y: MOV                R1.y,  1.0
0020  0000000e 82400000 LOOP_BREAK @28
0022  0000000c 83800001 POP @24 POP:1
0024  0000001e a0040000 ALU 2 @60
 0060  000004fb 00400c90    10      x: MOV                R2.x,  -1
 0062  800004f9 40200c90            z: MOV                R1.z,  1.0
0026  0000000e 82400000 LOOP_BREAK @28
0028  00000002 81400000 LOOP_END @4
0030  00000020 a00c0000 ALU 4 @64
 0064  00000001 00000c90    11      x: MOV                R0.x,  R1.x
 0066  00000401 20000c90            y: MOV                R0.y,  R1.y
 0068  00000801 40000c90            z: MOV                R0.z,  R1.z
 0070  80000c01 60000c90            w: MOV                R0.w,  R1.w
0032  c0000000 95000688 EXPORT_DONE        PIXEL 0     R0.xyzw
0034  00000000 88000000 CF_END @0
===== SHADER_END ===============================================================


===== SHADER #13 OPT ==================================== PS/CAYMAN/CAYMAN =====
===== 62 dw ===== 1 gprs ===== 2 stack =========================================
0000  40000011 a0080000 ALU 3 @34 KC0[CB0:0-15]
 0034  001000fa 0f801d10     1      x: SETE_INT           T0.x,  1, KC0[0].x
 0036  801f6080 2003c0f8            y: CNDGE_INT          R0.y,  KC0[0].x, -1, 0
 0038  8080007c 4003c0fb     2      z: CNDGE_INT          R0.z,  T0.x, R0.y, -1
0002  0000000f 81800000 LOOP_START_DX10 @30
0004  00000014 a4000000 ALU_PUSH_BEFORE 1 @40
 0040  801f0400 00002284     3 M    x: PRED_SETNE_INT     __.x,  R0.y, 0
0006  00000007 82800001 JUMP @14 POP:1
0008  00000015 a0080000 ALU 3 @42
 0042  000000f9 00000c90     4      x: MOV                R0.x,  1.0
 0044  000000f8 20000c90            y: MOV                R0.y,  0
 0046  800000f8 40000c90            z: MOV                R0.z,  0
0010  0000000e 82400000 LOOP_BREAK @28
0012  00000007 83800001 POP @14 POP:1
0014  00000018 a4000000 ALU_PUSH_BEFORE 1 @48
 0048  801f0800 00002284     5 M    x: PRED_SETNE_INT     __.x,  R0.z, 0
0016  0000000c 82800001 JUMP @24 POP:1
0018  00000019 a0080000 ALU 3 @50
 0050  000000f8 00000c90     6      x: MOV                R0.x,  0
 0052  000000f9 20000c90            y: MOV                R0.y,  1.0
 0054  800000f8 40000c90            z: MOV                R0.z,  0
0020  0000000e 82400000 LOOP_BREAK @28
0022  0000000c 83800001 POP @24 POP:1
0024  0000001c a0080000 ALU 3 @56
 0056  000000f8 00000c90     7      x: MOV                R0.x,  0
 0058  000000f8 20000c90            y: MOV                R0.y,  0
 0060  800000f9 40000c90            z: MOV                R0.z,  1.0
0026  0000000e 82400000 LOOP_BREAK @28
0028  00000002 81400000 LOOP_END @4
0030  c0000000 95000888 EXPORT_DONE        PIXEL 0     R0.xyz0
0032  00000000 88000000 CF_END @0
===== SHADER_END ===============================================================

Now I suspect it fails here because the stack depth is incorrectly
calculated, though there is a chance this may be a cayman specific
issue and the stack depth is just calculated wrong always.

Dave.


More information about the mesa-dev mailing list