[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
Vadim Girlin
vadimgirlin at gmail.com
Sun Apr 14 09:45:11 PDT 2013
On 04/13/2013 09:54 PM, Martin Andersson wrote:
> On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> On 04/12/2013 11:36 PM, Martin Andersson wrote:
>>>
>>> I have made some progress with this issue.
>>>
>>> Vadim, I did as you suggested and tried to mimic the output from the
>>> shader analyser
>>> tool. I used your patch as a base and then tried various ways to see
>>> what would work.
>>> After many tries (and lockups) I did managed to get the
>>> ext_transform_feedback/order
>>> test to pass.
>>>
>>> It is a very ugly patch but it should illustrate what the problem (and
>>> potential solution) is.
>>>
>>> Your test program fails however because explicit break statements do
>>> not work. It
>>> should be possible to use the same code for the explicit breaks as for
>>> the implicit
>>> loop break.The reason it does not is that I detect the implicit break
>>> with a hack and
>>> it does notwork for explicit breaks.
>>>
>>> The problem is that I need to detect the break statement when creating the
>>> corresponding if statement. So that I can treat it differently than
>>> other "regular" if
>>> statements. Anyone knows how I could do that, or is this the wrong
>>> approach?
>>>
>>
>> It doesn't work with my test app because IF/ENDIF blocks with BRK may
>> contain other code, so you can't simply throw away IF/ENDIF making that code
>> execute unconditionally.
>
> Yeah my hack is not an viable option.
>
>> By the way, shader analyzer in some cases also produces the code with
>> JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really
>> work as expected with catalyst. Possibly we're simply missing something in
>> the hardware configuration.
>>
>> Also there is one thing that I didn't take into account in my initial patch
>> - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain
>> why my initial patch doesn't work. Possibly if we prevent that optimization
>> for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough
>> to make it work. I'm attaching the additional patch that will force POP to
>> be a separate instruction in this case, please test it (on top of the my
>> first patch). This would be at least not very intrusive.
>
> No, that patch did not help either.
>
>> If this won't help, then I think we should understand what exactly we are
>> trying to fix before implementing any big changes, possibly there is a
>> better solution or at least a more clean workaround. In the worst case we
>> can return to your approach and improve it to handle other cases.
>
> I'm starting to think that there is nothing wrong with the shader
> compiler. It seems to me that a push, pop inside a nested loop clears
> the break status on a thread.
>
> shift_reg = 1u;
> count = 0u;
> while (true) {
> if (x == 1u)
> break;
> while (true) {
> if (x != 1u)
> count = 10u;
> if (x == 1u)
> count = 20u;
> break;
> }
> shift_reg = 2u;
> break;
> }
>
> input: x == 0
> actual ouput: shift_reg == 2, count == 10
> expected output: shift_reg == 2, count == 10
>
> input: x == 1
> actual ouput: shift_reg == 2, count == 20
> expected output: shift_reg == 1, count == 0
>
> If I swap the if statements in the inner loop I get different results.
>
> shift_reg = 1u;
> count = 0u;
> while (true) {
> if (x == 1u)
> break;
> while (true) {
> if (x == 1u)
> count = 20u;
> if (x != 1u)
> count = 10u;
> break;
> }
> shift_reg = 2u;
> break;
> }
>
> input: x == 0
> actual ouput: shift_reg == 2, count == 10
> expected output: shift_reg == 2, count == 10
>
> input: x == 1
> actual ouput: shift_reg == 2, count == 0
> expected output: shift_reg == 1, count == 0
>
> I tested both cases on mesa master and mesa master + Vadims two
> patches with the same results.
>
This turned out to be a known issue with cayman: BREAK/CONTINUE followed
by LOOP_STARTxxx for nested loop may put the branch stack into the state
such that ALU_PUSH_BEFORE doesn't work as expected.
It seems the simplest workaround is either to avoid ALU_PUSH_BEFORE in
nested loops completely or to replace it with separate PUSH and ALU.
We can check if we actually have BREAK/CONTINUE in the outer loop before
LOOP_START for the inner loop, but I think it will be true in most
cases, so the simplest fix for r600g is to replace all ALU_PUSH_BEFORE
with PUSH + ALU in the nested loops on cayman.
Vadim
> //Martin
>
>> Vadim
>>
>>
>>> //Martin
>>>
>>> On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin <vadimgirlin at gmail.com>
>>> wrote:
>>>>
>>>> On 04/11/2013 02:08 AM, Marek Olšák wrote:
>>>>>
>>>>>
>>>>> Here's the output:
>>>>>
>>>>> creating vs ...
>>>>> shader compilation status: OK
>>>>> creating fs ...
>>>>> shader compilation status: OK
>>>>> thread #0 (0;0) : ref = 16608
>>>>> thread #1 (1;0) : ref = 27873
>>>>> thread #2 (0;1) : ref = 16608
>>>>> thread #3 (1;1) : ref = 27877
>>>>> results:
>>>>> thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
>>>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>>> thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
>>>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>>
>>>>
>>>> Thanks. According to these results, it looks like LOOP_START_DX10 for
>>>> inner
>>>> loop somehow reactivates the threads that were put into inactive-break
>>>> state
>>>> by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the
>>>> inner
>>>> loop doesn't work as expected in this case. In other words, it looks
>>>> weird.
>>>>
>>>> I can't explain why would this happen. It might be interesting to run
>>>> these
>>>> tests with llvm backend to see if there are any differences.
>>>>
>>>> Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP
>>>> in
>>>> the PRED_SET encoding as in my earlier patch, but without any stack
>>>> push/pop
>>>> operations and jumps (where it's possible), closer to what the catalyst
>>>> (shader analyzer) does. I'm not sure if it will help though, and anyway
>>>> we'll need stack operations in some cases, so I'm afraid this won't fix
>>>> the
>>>> issue completely.
>>>>
>>>> So far I have no other ideas.
>>>>
>>>> Vadim
>>>>
>>>>> Marek
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin
>>>>> <vadimgirlin at gmail.com>wrote:
>>>>>
>>>>>> On 04/10/2013 01:53 PM, Marek Olšák wrote:
>>>>>>
>>>>>>> glsl-fs-loop-nested passes here.
>>>>>>>
>>>>>>> nstack is 3 and adding 4 to it doesn't help.
>>>>>>>
>>>>>>
>>>>>> Ok, thanks.
>>>>>>
>>>>>> Also I wrote a simple test app that should reproduce the issue if it's
>>>>>> really related to diverging control flow with nested loops and might
>>>>>> more
>>>>>> information about what's going wrong.
>>>>>>
>>>>>> The source is in the attachment and needs to be compiled with -lGL
>>>>>> -lglut
>>>>>> -lGLEW. The app renders four points and computes some value for each
>>>>>> point
>>>>>> in the loops similar to the transform feedback order test, but it
>>>>>> doesn't
>>>>>> use tfb. It should render four green or red squares depending on
>>>>>> correctness of the result.
>>>>>>
>>>>>> Here is the correct output produced for me on evergreen:
>>>>>>
>>>>>> thread 0 (0, 0): expected = 16608, observed = 16608, OK
>>>>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>>>> thread 2 (0, 1): expected = 16608, observed = 16608, OK
>>>>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>>>
>>>>>> Please post the output if it fails on cayman.
>>>>>>
>>>>>> Vadim
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgirlin at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 04/10/2013 03:58 AM, Marek Olšák wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Vadim,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> your patch does not fix the test.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hmm, I'm out of ideas then. Thanks for testing.
>>>>>>>>
>>>>>>>> I've checked the shader dump few times but I don't see anything
>>>>>>>> obviously
>>>>>>>> wrong there, and the same code (except the minor ALU grouping changes
>>>>>>>> due
>>>>>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen.
>>>>>>>>
>>>>>>>> According to the Martin's observations it looks like if the threads
>>>>>>>> that
>>>>>>>> shouldn't execute the loop body were incorrectly left in the active
>>>>>>>> state.
>>>>>>>> LOOP_BREAK should put them into the inactive-break state, but
>>>>>>>> something
>>>>>>>> goes wrong. Do the other piglit tests with nested loops (e.g.
>>>>>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no
>>>>>>>> other
>>>>>>>> tests with the diverging loops as in this case.
>>>>>>>>
>>>>>>>> I'll try to write a simpler test with the diverging loops to see if
>>>>>>>> the
>>>>>>>> issue is really caused by the incorrect control flow handling, and to
>>>>>>>> figure out the exact instruction that results in the incorrect active
>>>>>>>> state.
>>>>>>>>
>>>>>>>> Also probably it worth checking if the stack size is correct for that
>>>>>>>> shader (latest mesa should print nstack value in the shader
>>>>>>>> disassemble
>>>>>>>> header, I think it should be 3 for that shader) and maybe try adding
>>>>>>>> some
>>>>>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to
>>>>>>>> be
>>>>>>>> sure that we reserve enough of stack space, though I don't think
>>>>>>>> stack
>>>>>>>> size
>>>>>>>> is the cause of this issue.
>>>>>>>>
>>>>>>>> Vadim
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin
>>>>>>>>> <vadimgirlin at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 04/09/2013 10:58 AM, Martin Andersson wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <maraeo at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Pushed, thanks. The transform feedback test still doesn't
>>>>>>>>>>> pass,
>>>>>>>>>>> but
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>> least
>>>>>>>>>>>> the hardlocks are gone.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, I have looked into the other issue as well
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**>
>>>>>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/**
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
>>>>>>>>>>> dev/2013-March/036941.html<htt**p://lists.freedesktop.org/**
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem arises when there are nested loops. If I rework the
>>>>>>>>>>> code
>>>>>>>>>>> so there are
>>>>>>>>>>> no nested loops the issue disappears. At least one pixel also
>>>>>>>>>>> needs
>>>>>>>>>>> to
>>>>>>>>>>> enter the
>>>>>>>>>>> outer loop. The pixels that should enter the outer loop behaves
>>>>>>>>>>> correctly. It is those
>>>>>>>>>>> pixels that should not enter the outer loop that misbehaves. It
>>>>>>>>>>> does
>>>>>>>>>>> not matter if they
>>>>>>>>>>> also fails the test for the inner loop, they will still execute
>>>>>>>>>>> the
>>>>>>>>>>> instruction inside. That
>>>>>>>>>>> leads to the strange results for that test.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vadim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The strangeness is easier to see if the NUM_POINTS in the
>>>>>>>>>>
>>>>>>>>>>> ext_transform_feedback/
>>>>>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the
>>>>>>>>>>> code
>>>>>>>>>>> that fail the test
>>>>>>>>>>> and print starting_x, shift_reg_final and iteration_count.
>>>>>>>>>>>
>>>>>>>>>>> Marek, since you implemented transform feedback for r600, do you
>>>>>>>>>>> think
>>>>>>>>>>> the issue
>>>>>>>>>>> is with the tranform feedback code or the shader compiler or some
>>>>>>>>>>> other
>>>>>>>>>>> thing?
>>>>>>>>>>>
>>>>>>>>>>> //Martin
>>>>>>>>>>> ______________________________******_________________
>>>>>>>>>>> mesa-dev mailing list
>>>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ______________________________****_________________
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> mesa-dev mailing list
>>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
More information about the mesa-dev
mailing list