[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

Martin Andersson g02maran at gmail.com
Sat Apr 13 10:54:53 PDT 2013


On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> On 04/12/2013 11:36 PM, Martin Andersson wrote:
>>
>> I have made some progress with this issue.
>>
>> Vadim, I did as you suggested and tried to mimic the output from the
>> shader analyser
>> tool. I used your patch as a base and then tried various ways to see
>> what would work.
>> After many tries (and lockups) I did managed to get the
>> ext_transform_feedback/order
>> test to pass.
>>
>> It is a very ugly patch but it should illustrate what the problem (and
>> potential solution) is.
>>
>> Your test program fails however because explicit break statements do
>> not work. It
>> should be possible to use the same code for the explicit breaks as for
>> the implicit
>> loop break.The reason it does not is that I detect the implicit break
>> with a hack and
>> it does notwork for explicit breaks.
>>
>> The problem is that I need to detect the break statement when creating the
>> corresponding if statement. So that I can treat it differently than
>> other "regular" if
>> statements. Anyone knows how I could do that, or is this the wrong
>> approach?
>>
>
> It doesn't work with my test app because IF/ENDIF blocks with BRK may
> contain other code, so you can't simply throw away IF/ENDIF making that code
> execute unconditionally.

Yeah my hack is not an viable option.

> By the way, shader analyzer in some cases also produces the code with
> JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really
> work as expected with catalyst. Possibly we're simply missing something in
> the hardware configuration.
>
> Also there is one thing that I didn't take into account in my initial patch
> - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain
> why my initial patch doesn't work. Possibly if we prevent that optimization
> for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough
> to make it work. I'm attaching the additional patch that will force POP to
> be a separate instruction in this case, please test it (on top of the my
> first patch). This would be at least not very intrusive.

No, that patch did not help either.

> If this won't help, then I think we should understand what exactly we are
> trying to fix before implementing any big changes, possibly there is a
> better solution or at least a more clean workaround. In the worst case we
> can return to your approach and improve it to handle other cases.

I'm starting to think that there is nothing wrong with the shader
compiler. It seems to me that a push, pop inside a nested loop clears
the break status on a thread.

shift_reg = 1u;
count = 0u;
while (true) {
    if (x == 1u)
        break;
     while (true) {
         if (x != 1u)
              count = 10u;
         if (x == 1u)
              count = 20u;
         break;
     }
     shift_reg = 2u;
     break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 20
expected output: shift_reg == 1, count == 0

If I swap the if statements in the inner loop I get different results.

shift_reg = 1u;
count = 0u;
while (true) {
    if (x == 1u)
        break;
     while (true) {
         if (x == 1u)
              count = 20u;
         if (x != 1u)
              count = 10u;
         break;
     }
     shift_reg = 2u;
     break;
}

input: x == 0
actual ouput: shift_reg == 2, count == 10
expected output: shift_reg == 2, count == 10

input: x == 1
actual ouput: shift_reg == 2, count == 0
expected output: shift_reg == 1, count == 0

I tested both cases on mesa master and mesa master + Vadims two
patches with the same results.

//Martin

> Vadim
>
>
>> //Martin
>>
>> On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin <vadimgirlin at gmail.com>
>> wrote:
>>>
>>> On 04/11/2013 02:08 AM, Marek Olšák wrote:
>>>>
>>>>
>>>> Here's the output:
>>>>
>>>> creating vs ...
>>>> shader compilation status: OK
>>>> creating fs ...
>>>> shader compilation status: OK
>>>> thread #0 (0;0) : ref = 16608
>>>> thread #1 (1;0) : ref = 27873
>>>> thread #2 (0;1) : ref = 16608
>>>> thread #3 (1;1) : ref = 27877
>>>> results:
>>>>    thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
>>>>    thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>>    thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
>>>>    thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>
>>>
>>> Thanks. According to these results, it looks like LOOP_START_DX10 for
>>> inner
>>> loop somehow reactivates the threads that were put into inactive-break
>>> state
>>> by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the
>>> inner
>>> loop doesn't work as expected in this case. In other words, it looks
>>> weird.
>>>
>>> I can't explain why would this happen. It might be interesting to run
>>> these
>>> tests with llvm backend to see if there are any differences.
>>>
>>> Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP
>>> in
>>> the PRED_SET encoding as in my earlier patch, but without any stack
>>> push/pop
>>> operations and jumps (where it's possible), closer to what the catalyst
>>> (shader analyzer) does. I'm not sure if it will help though, and anyway
>>> we'll need stack operations in some cases, so I'm afraid this won't fix
>>> the
>>> issue completely.
>>>
>>> So far I have no other ideas.
>>>
>>> Vadim
>>>
>>>> Marek
>>>>
>>>>
>>>> On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin
>>>> <vadimgirlin at gmail.com>wrote:
>>>>
>>>>> On 04/10/2013 01:53 PM, Marek Olšák wrote:
>>>>>
>>>>>> glsl-fs-loop-nested passes here.
>>>>>>
>>>>>> nstack is 3 and adding 4 to it doesn't help.
>>>>>>
>>>>>
>>>>> Ok, thanks.
>>>>>
>>>>> Also I wrote a simple test app that should reproduce the issue if it's
>>>>> really related to diverging control flow with nested loops and might
>>>>> more
>>>>> information about what's going wrong.
>>>>>
>>>>> The source is in the attachment and needs to be compiled with -lGL
>>>>> -lglut
>>>>> -lGLEW. The app renders four points and computes some value for each
>>>>> point
>>>>> in the loops similar to the transform feedback order test, but it
>>>>> doesn't
>>>>> use tfb. It should render four green or red squares depending on
>>>>> correctness of the result.
>>>>>
>>>>> Here is the correct output produced for me on evergreen:
>>>>>
>>>>>    thread 0 (0, 0): expected = 16608, observed = 16608, OK
>>>>>    thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>>>    thread 2 (0, 1): expected = 16608, observed = 16608, OK
>>>>>    thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>>
>>>>> Please post the output if it fails on cayman.
>>>>>
>>>>> Vadim
>>>>>
>>>>>
>>>>>
>>>>>> Marek
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgirlin at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>    On 04/10/2013 03:58 AM, Marek Olšák wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    Hi Vadim,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> your patch does not fix the test.
>>>>>>>>
>>>>>>>>
>>>>>>> Hmm, I'm out of ideas then. Thanks for testing.
>>>>>>>
>>>>>>> I've checked the shader dump few times but I don't see anything
>>>>>>> obviously
>>>>>>> wrong there, and the same code (except the minor ALU grouping changes
>>>>>>> due
>>>>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen.
>>>>>>>
>>>>>>> According to the Martin's observations it looks like if the threads
>>>>>>> that
>>>>>>> shouldn't execute the loop body were incorrectly left in the active
>>>>>>> state.
>>>>>>> LOOP_BREAK should put them into the inactive-break state, but
>>>>>>> something
>>>>>>> goes wrong. Do the other piglit tests with nested loops (e.g.
>>>>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no
>>>>>>> other
>>>>>>> tests with the diverging loops as in this case.
>>>>>>>
>>>>>>> I'll try to write a simpler test with the diverging loops to see if
>>>>>>> the
>>>>>>> issue is really caused by the incorrect control flow handling, and to
>>>>>>> figure out the exact instruction that results in the incorrect active
>>>>>>> state.
>>>>>>>
>>>>>>> Also probably it worth checking if the stack size is correct for that
>>>>>>> shader (latest mesa should print nstack value in the shader
>>>>>>> disassemble
>>>>>>> header, I think it should be 3 for that shader) and maybe try adding
>>>>>>> some
>>>>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to
>>>>>>> be
>>>>>>> sure that we reserve enough of stack space, though I don't think
>>>>>>> stack
>>>>>>> size
>>>>>>> is the cause of this issue.
>>>>>>>
>>>>>>> Vadim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    Marek
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin
>>>>>>>> <vadimgirlin at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>     On 04/09/2013 10:58 AM, Martin Andersson wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>     On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <maraeo at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Pushed, thanks. The transform feedback test still doesn't
>>>>>>>>>> pass,
>>>>>>>>>> but
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>> least
>>>>>>>>>>> the hardlocks are gone.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Thanks, I have looked into the other issue as well
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**>
>>>>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/**
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
>>>>>>>>>> dev/2013-March/036941.html<htt**p://lists.freedesktop.org/**
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The problem arises when there are nested loops. If I rework the
>>>>>>>>>> code
>>>>>>>>>> so there are
>>>>>>>>>> no nested loops the issue disappears. At least one pixel also
>>>>>>>>>> needs
>>>>>>>>>> to
>>>>>>>>>> enter the
>>>>>>>>>> outer loop. The pixels that should enter the outer loop behaves
>>>>>>>>>> correctly. It is those
>>>>>>>>>> pixels that should not enter the outer loop that misbehaves. It
>>>>>>>>>> does
>>>>>>>>>> not matter if they
>>>>>>>>>> also fails the test for the inner loop, they will still execute
>>>>>>>>>> the
>>>>>>>>>> instruction inside. That
>>>>>>>>>> leads to the strange results for that test.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Please test the attached patch.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vadim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     The strangeness is easier to see if the NUM_POINTS in the
>>>>>>>>>
>>>>>>>>>> ext_transform_feedback/
>>>>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the
>>>>>>>>>> code
>>>>>>>>>> that fail the test
>>>>>>>>>> and print starting_x, shift_reg_final and iteration_count.
>>>>>>>>>>
>>>>>>>>>> Marek, since you implemented transform feedback for r600, do you
>>>>>>>>>> think
>>>>>>>>>> the issue
>>>>>>>>>> is with the tranform feedback code or the shader compiler or some
>>>>>>>>>> other
>>>>>>>>>> thing?
>>>>>>>>>>
>>>>>>>>>> //Martin
>>>>>>>>>> ______________________________******_________________
>>>>>>>>>> mesa-dev mailing list
>>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    ______________________________****_________________
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> mesa-dev mailing list
>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>


More information about the mesa-dev mailing list