[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman
Vadim Girlin
vadimgirlin at gmail.com
Fri Apr 12 19:23:39 PDT 2013
On 04/12/2013 11:36 PM, Martin Andersson wrote:
> I have made some progress with this issue.
>
> Vadim, I did as you suggested and tried to mimic the output from the
> shader analyser
> tool. I used your patch as a base and then tried various ways to see
> what would work.
> After many tries (and lockups) I did managed to get the
> ext_transform_feedback/order
> test to pass.
>
> It is a very ugly patch but it should illustrate what the problem (and
> potential solution) is.
>
> Your test program fails however because explicit break statements do
> not work. It
> should be possible to use the same code for the explicit breaks as for
> the implicit
> loop break.The reason it does not is that I detect the implicit break
> with a hack and
> it does notwork for explicit breaks.
>
> The problem is that I need to detect the break statement when creating the
> corresponding if statement. So that I can treat it differently than
> other "regular" if
> statements. Anyone knows how I could do that, or is this the wrong approach?
>
It doesn't work with my test app because IF/ENDIF blocks with BRK may
contain other code, so you can't simply throw away IF/ENDIF making that
code execute unconditionally.
By the way, shader analyzer in some cases also produces the code with
JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will
really work as expected with catalyst. Possibly we're simply missing
something in the hardware configuration.
Also there is one thing that I didn't take into account in my initial
patch - r600g converts ALU followed by POP to ALU_POP_AFTER and this
might explain why my initial patch doesn't work. Possibly if we prevent
that optimization for ALU containing PRED_SET-BREAK and leave separate
POP, it might be enough to make it work. I'm attaching the additional
patch that will force POP to be a separate instruction in this case,
please test it (on top of the my first patch). This would be at least
not very intrusive.
If this won't help, then I think we should understand what exactly we
are trying to fix before implementing any big changes, possibly there is
a better solution or at least a more clean workaround. In the worst case
we can return to your approach and improve it to handle other cases.
Vadim
> //Martin
>
> On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> On 04/11/2013 02:08 AM, Marek Olšák wrote:
>>>
>>> Here's the output:
>>>
>>> creating vs ...
>>> shader compilation status: OK
>>> creating fs ...
>>> shader compilation status: OK
>>> thread #0 (0;0) : ref = 16608
>>> thread #1 (1;0) : ref = 27873
>>> thread #2 (0;1) : ref = 16608
>>> thread #3 (1;1) : ref = 27877
>>> results:
>>> thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>> thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>
>>
>> Thanks. According to these results, it looks like LOOP_START_DX10 for inner
>> loop somehow reactivates the threads that were put into inactive-break state
>> by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the inner
>> loop doesn't work as expected in this case. In other words, it looks weird.
>>
>> I can't explain why would this happen. It might be interesting to run these
>> tests with llvm backend to see if there are any differences.
>>
>> Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP in
>> the PRED_SET encoding as in my earlier patch, but without any stack push/pop
>> operations and jumps (where it's possible), closer to what the catalyst
>> (shader analyzer) does. I'm not sure if it will help though, and anyway
>> we'll need stack operations in some cases, so I'm afraid this won't fix the
>> issue completely.
>>
>> So far I have no other ideas.
>>
>> Vadim
>>
>>> Marek
>>>
>>>
>>> On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin
>>> <vadimgirlin at gmail.com>wrote:
>>>
>>>> On 04/10/2013 01:53 PM, Marek Olšák wrote:
>>>>
>>>>> glsl-fs-loop-nested passes here.
>>>>>
>>>>> nstack is 3 and adding 4 to it doesn't help.
>>>>>
>>>>
>>>> Ok, thanks.
>>>>
>>>> Also I wrote a simple test app that should reproduce the issue if it's
>>>> really related to diverging control flow with nested loops and might more
>>>> information about what's going wrong.
>>>>
>>>> The source is in the attachment and needs to be compiled with -lGL -lglut
>>>> -lGLEW. The app renders four points and computes some value for each
>>>> point
>>>> in the loops similar to the transform feedback order test, but it doesn't
>>>> use tfb. It should render four green or red squares depending on
>>>> correctness of the result.
>>>>
>>>> Here is the correct output produced for me on evergreen:
>>>>
>>>> thread 0 (0, 0): expected = 16608, observed = 16608, OK
>>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>>> thread 2 (0, 1): expected = 16608, observed = 16608, OK
>>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>>>
>>>> Please post the output if it fails on cayman.
>>>>
>>>> Vadim
>>>>
>>>>
>>>>
>>>>> Marek
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgirlin at gmail.com>
>>>>> wrote:
>>>>>
>>>>> On 04/10/2013 03:58 AM, Marek Olšák wrote:
>>>>>>
>>>>>>
>>>>>> Hi Vadim,
>>>>>>>
>>>>>>>
>>>>>>> your patch does not fix the test.
>>>>>>>
>>>>>>>
>>>>>> Hmm, I'm out of ideas then. Thanks for testing.
>>>>>>
>>>>>> I've checked the shader dump few times but I don't see anything
>>>>>> obviously
>>>>>> wrong there, and the same code (except the minor ALU grouping changes
>>>>>> due
>>>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen.
>>>>>>
>>>>>> According to the Martin's observations it looks like if the threads
>>>>>> that
>>>>>> shouldn't execute the loop body were incorrectly left in the active
>>>>>> state.
>>>>>> LOOP_BREAK should put them into the inactive-break state, but something
>>>>>> goes wrong. Do the other piglit tests with nested loops (e.g.
>>>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no other
>>>>>> tests with the diverging loops as in this case.
>>>>>>
>>>>>> I'll try to write a simpler test with the diverging loops to see if the
>>>>>> issue is really caused by the incorrect control flow handling, and to
>>>>>> figure out the exact instruction that results in the incorrect active
>>>>>> state.
>>>>>>
>>>>>> Also probably it worth checking if the stack size is correct for that
>>>>>> shader (latest mesa should print nstack value in the shader disassemble
>>>>>> header, I think it should be 3 for that shader) and maybe try adding
>>>>>> some
>>>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to
>>>>>> be
>>>>>> sure that we reserve enough of stack space, though I don't think stack
>>>>>> size
>>>>>> is the cause of this issue.
>>>>>>
>>>>>> Vadim
>>>>>>
>>>>>>
>>>>>>
>>>>>> Marek
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin <vadimgirlin at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 04/09/2013 10:58 AM, Martin Andersson wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <maraeo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pushed, thanks. The transform feedback test still doesn't pass,
>>>>>>>>> but
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>>> least
>>>>>>>>>> the hardlocks are gone.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks, I have looked into the other issue as well
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**>
>>>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/**
>>>>>>>>>
>>>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
>>>>>>>>> dev/2013-March/036941.html<htt**p://lists.freedesktop.org/**
>>>>>>>>>
>>>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem arises when there are nested loops. If I rework the code
>>>>>>>>> so there are
>>>>>>>>> no nested loops the issue disappears. At least one pixel also needs
>>>>>>>>> to
>>>>>>>>> enter the
>>>>>>>>> outer loop. The pixels that should enter the outer loop behaves
>>>>>>>>> correctly. It is those
>>>>>>>>> pixels that should not enter the outer loop that misbehaves. It does
>>>>>>>>> not matter if they
>>>>>>>>> also fails the test for the inner loop, they will still execute the
>>>>>>>>> instruction inside. That
>>>>>>>>> leads to the strange results for that test.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please test the attached patch.
>>>>>>>>
>>>>>>>>
>>>>>>>> Vadim
>>>>>>>>
>>>>>>>>
>>>>>>>> The strangeness is easier to see if the NUM_POINTS in the
>>>>>>>>
>>>>>>>>> ext_transform_feedback/
>>>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the
>>>>>>>>> code
>>>>>>>>> that fail the test
>>>>>>>>> and print starting_x, shift_reg_final and iteration_count.
>>>>>>>>>
>>>>>>>>> Marek, since you implemented transform feedback for r600, do you
>>>>>>>>> think
>>>>>>>>> the issue
>>>>>>>>> is with the tranform feedback code or the shader compiler or some
>>>>>>>>> other
>>>>>>>>> thing?
>>>>>>>>>
>>>>>>>>> //Martin
>>>>>>>>> ______________________________******_________________
>>>>>>>>> mesa-dev mailing list
>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>>
>>>>>>>>> http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>> <h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ______________________________****_________________
>>>>>>>>
>>>>>>>> mesa-dev mailing list
>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>>> <htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-prevent-ALU-POP-optimization-for-loop-break-continue.patch
Type: text/x-patch
Size: 805 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130413/c4ae58b6/attachment.bin>
More information about the mesa-dev
mailing list