[Mesa-dev] r600/sb loop issue
Dave Airlie
airlied at gmail.com
Mon Dec 8 16:25:26 PST 2014
On 8 December 2014 at 20:41, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> On 12/06/2014 07:13 AM, Vadim Girlin wrote:
>>
>> On 12/04/2014 01:43 AM, Dave Airlie wrote:
>>>
>>> Hi Vadim,
>>>
>>> I've been looking with Glenn's help into a bug in sb for a couple of
>>> weeks now triggered by a change in how GLSL generates switch
>>> statements.
>>>
>>> I understand you probably aren't too interested in r600g but I believe
>>> I'm hitting a design level problem and I would like some advice.
>>>
>>> So it appears that GLSL can create loops that don't repeat for switch
>>> statements, and it appears SB wasn't ready to handle such a thing.
>>
>>
>> Hi, Dave,
>>
>> I suspect we should rather get rid of such loops somehow, i.e. convert
>> to something else, the loop that never repeats is not really a loop
>> anyway. AFAICS "continue" is not supported in switch statements
>> according to GLSL specs, so the loops generated for switch will never be
>> repeated. Am I missing something? Even if repeating is possible somehow,
>> at least we can get rid of the loops that are not repeated.
>>
>> I think loops are less efficient than other control flow instructions on
>> r600g hw (at least because they increase stack usage), and possibly on
>> other hw too.
>>
>> In fact it seems sb basically gets rid of it already in IR, it just
>> doesn't know how to translate resulting control flow to ISA, because so
>> far it only supports specific control flow structure for if-then-else
>> that was previously preserved during optimizations. I think it may be
>> not very hard to implement support for that in finalizer, I'll look into
>> it.
>
>
> In fact handling that control flow in finalizer is not as easy as I hoped,
> probably impossible, at least if we want to make it efficient. I forgot
> about the limitations of R600 ISA.
>
> OTOH it seems I've managed to fix the issues with loops, the patch is
> attached (it's meant to be used instead of 7b0067d2). There are no piglit
> regressions on evergreen, but I didn't test any real apps.
>
This fixes one thing, but the switches are still broken here on cayman at least
tests/spec/glsl-1.30/execution/switch/fs-default_last.shader_test
--------------------------------------------------------------
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0..2], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
IMM[1] UINT32 {0, 4294967295, 0, 0}
IMM[2] INT32 {1, 0, 0, 0}
0: MOV TEMP[0], IMM[0].xxxx
1: MOV TEMP[1].x, IMM[1].xxxx
2: BGNLOOP :0
3: UCMP TEMP[1].x, CONST[0].xxxx, TEMP[1].xxxx, IMM[1].yyyy
4: UIF TEMP[1].xxxx :0
5: MOV TEMP[0].x, IMM[0].yyyy
6: BRK
7: ENDIF
8: USEQ TEMP[2].x, IMM[2].xxxx, CONST[0].xxxx
9: UCMP TEMP[1].x, TEMP[2].xxxx, IMM[1].yyyy, TEMP[1].xxxx
10: UIF TEMP[1].xxxx :0
11: MOV TEMP[0].y, IMM[0].yyyy
12: BRK
13: ENDIF
14: MOV TEMP[1].x, IMM[1].yyyy
15: MOV TEMP[0].z, IMM[0].yyyy
16: BRK
17: ENDLOOP :0
18: MOV OUT[0], TEMP[0]
19: END
===== SHADER #13 ======================================== PS/CAYMAN/CAYMAN =====
===== 72 dw ===== 6 gprs ===== 2 stack =========================================
0000 00000012 a0100000 ALU 5 @36
0036 000000f8 00200c90 1 x: MOV R1.x, 0
0038 000000f8 20200c90 y: MOV R1.y, 0
0040 000000f8 40200c90 z: MOV R1.z, 0
0042 800000f8 60200c90 w: MOV R1.w, 0
0044 800000f8 00400c90 2 x: MOV R2.x, 0
0002 0000000f 81800000 LOOP_START_DX10 @30
0004 40000017 a4040000 ALU_PUSH_BEFORE 2 @46 KC0[CB0:0-15]
0046 809f6080 0043c002 3 x: CNDGE_INT R2.x,
KC0[0].x, -1, R2.x
0048 801f00fe 00a0229c 4 MP x: PRED_SETNE_INT R5.x, PV.x, 0
0006 00000007 82800001 JUMP @14 POP:1
0008 00000019 a0000000 ALU 1 @50
0050 800004f9 00200c90 5 x: MOV R1.x, 1.0
0010 0000000e 82400000 LOOP_BREAK @28
0012 00000007 83800001 POP @14 POP:1
0014 4000001a a4080000 ALU_PUSH_BEFORE 3 @52 KC0[CB0:0-15]
0052 801000fa 00601d10 6 x: SETE_INT R3.x, 1, KC0[0].x
0054 800040fe 0043c4fb 7 x: CNDGE_INT R2.x, PV.x, R2.x, -1
0056 801f00fe 00a0229c 8 MP x: PRED_SETNE_INT R5.x, PV.x, 0
0016 0000000c 82800001 JUMP @24 POP:1
0018 0000001d a0000000 ALU 1 @58
0058 800004f9 20200c90 9 y: MOV R1.y, 1.0
0020 0000000e 82400000 LOOP_BREAK @28
0022 0000000c 83800001 POP @24 POP:1
0024 0000001e a0040000 ALU 2 @60
0060 000004fb 00400c90 10 x: MOV R2.x, -1
0062 800004f9 40200c90 z: MOV R1.z, 1.0
0026 0000000e 82400000 LOOP_BREAK @28
0028 00000002 81400000 LOOP_END @4
0030 00000020 a00c0000 ALU 4 @64
0064 00000001 00000c90 11 x: MOV R0.x, R1.x
0066 00000401 20000c90 y: MOV R0.y, R1.y
0068 00000801 40000c90 z: MOV R0.z, R1.z
0070 80000c01 60000c90 w: MOV R0.w, R1.w
0032 c0000000 95000688 EXPORT_DONE PIXEL 0 R0.xyzw
0034 00000000 88000000 CF_END @0
===== SHADER_END ===============================================================
===== SHADER #13 OPT ==================================== PS/CAYMAN/CAYMAN =====
===== 62 dw ===== 1 gprs ===== 2 stack =========================================
0000 40000011 a0080000 ALU 3 @34 KC0[CB0:0-15]
0034 001000fa 0f801d10 1 x: SETE_INT T0.x, 1, KC0[0].x
0036 801f6080 2003c0f8 y: CNDGE_INT R0.y, KC0[0].x, -1, 0
0038 8080007c 4003c0fb 2 z: CNDGE_INT R0.z, T0.x, R0.y, -1
0002 0000000f 81800000 LOOP_START_DX10 @30
0004 00000014 a4000000 ALU_PUSH_BEFORE 1 @40
0040 801f0400 00002284 3 M x: PRED_SETNE_INT __.x, R0.y, 0
0006 00000007 82800001 JUMP @14 POP:1
0008 00000015 a0080000 ALU 3 @42
0042 000000f9 00000c90 4 x: MOV R0.x, 1.0
0044 000000f8 20000c90 y: MOV R0.y, 0
0046 800000f8 40000c90 z: MOV R0.z, 0
0010 0000000e 82400000 LOOP_BREAK @28
0012 00000007 83800001 POP @14 POP:1
0014 00000018 a4000000 ALU_PUSH_BEFORE 1 @48
0048 801f0800 00002284 5 M x: PRED_SETNE_INT __.x, R0.z, 0
0016 0000000c 82800001 JUMP @24 POP:1
0018 00000019 a0080000 ALU 3 @50
0050 000000f8 00000c90 6 x: MOV R0.x, 0
0052 000000f9 20000c90 y: MOV R0.y, 1.0
0054 800000f8 40000c90 z: MOV R0.z, 0
0020 0000000e 82400000 LOOP_BREAK @28
0022 0000000c 83800001 POP @24 POP:1
0024 0000001c a0080000 ALU 3 @56
0056 000000f8 00000c90 7 x: MOV R0.x, 0
0058 000000f8 20000c90 y: MOV R0.y, 0
0060 800000f9 40000c90 z: MOV R0.z, 1.0
0026 0000000e 82400000 LOOP_BREAK @28
0028 00000002 81400000 LOOP_END @4
0030 c0000000 95000888 EXPORT_DONE PIXEL 0 R0.xyz0
0032 00000000 88000000 CF_END @0
===== SHADER_END ===============================================================
Now I suspect it fails here because the stack depth is incorrectly
calculated, though there is a chance this may be a cayman specific
issue and the stack depth is just calculated wrong always.
Dave.
More information about the mesa-dev
mailing list