[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

Vadim Girlin vadimgirlin at gmail.com
Thu Apr 11 08:31:11 PDT 2013


On 04/11/2013 02:08 AM, Marek Olšák wrote:
> Here's the output:
>
> creating vs ...
> shader compilation status: OK
> creating fs ...
> shader compilation status: OK
> thread #0 (0;0) : ref = 16608
> thread #1 (1;0) : ref = 27873
> thread #2 (0;1) : ref = 16608
> thread #3 (1;1) : ref = 27877
> results:
>   thread 0 (0, 0): expected = 16608, observed = 27876, FAIL
>   thread 1 (1, 0): expected = 27873, observed = 27873, OK
>   thread 2 (0, 1): expected = 16608, observed = 27876, FAIL
>   thread 3 (1, 1): expected = 27877, observed = 27877, OK
>

Thanks. According to these results, it looks like LOOP_START_DX10 for 
inner loop somehow reactivates the threads that were put into 
inactive-break state by the LOOP_BREAK in the outer loop. Also it seems 
LOOP_BREAK in the inner loop doesn't work as expected in this case. In 
other words, it looks weird.

I can't explain why would this happen. It might be interesting to run 
these tests with llvm backend to see if there are any differences.

Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP 
in the PRED_SET encoding as in my earlier patch, but without any stack 
push/pop operations and jumps (where it's possible), closer to what the 
catalyst (shader analyzer) does. I'm not sure if it will help though, 
and anyway we'll need stack operations in some cases, so I'm afraid this 
won't fix the issue completely.

So far I have no other ideas.

Vadim

> Marek
>
>
> On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin <vadimgirlin at gmail.com>wrote:
>
>> On 04/10/2013 01:53 PM, Marek Olšák wrote:
>>
>>> glsl-fs-loop-nested passes here.
>>>
>>> nstack is 3 and adding 4 to it doesn't help.
>>>
>>
>> Ok, thanks.
>>
>> Also I wrote a simple test app that should reproduce the issue if it's
>> really related to diverging control flow with nested loops and might more
>> information about what's going wrong.
>>
>> The source is in the attachment and needs to be compiled with -lGL -lglut
>> -lGLEW. The app renders four points and computes some value for each point
>> in the loops similar to the transform feedback order test, but it doesn't
>> use tfb. It should render four green or red squares depending on
>> correctness of the result.
>>
>> Here is the correct output produced for me on evergreen:
>>
>>   thread 0 (0, 0): expected = 16608, observed = 16608, OK
>>   thread 1 (1, 0): expected = 27873, observed = 27873, OK
>>   thread 2 (0, 1): expected = 16608, observed = 16608, OK
>>   thread 3 (1, 1): expected = 27877, observed = 27877, OK
>>
>> Please post the output if it fails on cayman.
>>
>> Vadim
>>
>>
>>
>>> Marek
>>>
>>>
>>> On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgirlin at gmail.com>
>>> wrote:
>>>
>>>   On 04/10/2013 03:58 AM, Marek Olšák wrote:
>>>>
>>>>   Hi Vadim,
>>>>>
>>>>> your patch does not fix the test.
>>>>>
>>>>>
>>>> Hmm, I'm out of ideas then. Thanks for testing.
>>>>
>>>> I've checked the shader dump few times but I don't see anything obviously
>>>> wrong there, and the same code (except the minor ALU grouping changes due
>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen.
>>>>
>>>> According to the Martin's observations it looks like if the threads that
>>>> shouldn't execute the loop body were incorrectly left in the active
>>>> state.
>>>> LOOP_BREAK should put them into the inactive-break state, but something
>>>> goes wrong. Do the other piglit tests with nested loops (e.g.
>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no other
>>>> tests with the diverging loops as in this case.
>>>>
>>>> I'll try to write a simpler test with the diverging loops to see if the
>>>> issue is really caused by the incorrect control flow handling, and to
>>>> figure out the exact instruction that results in the incorrect active
>>>> state.
>>>>
>>>> Also probably it worth checking if the stack size is correct for that
>>>> shader (latest mesa should print nstack value in the shader disassemble
>>>> header, I think it should be 3 for that shader) and maybe try adding some
>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to be
>>>> sure that we reserve enough of stack space, though I don't think stack
>>>> size
>>>> is the cause of this issue.
>>>>
>>>> Vadim
>>>>
>>>>
>>>>
>>>>   Marek
>>>>>
>>>>>
>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin <vadimgirlin at gmail.com>
>>>>> wrote:
>>>>>
>>>>>    On 04/09/2013 10:58 AM, Martin Andersson wrote:
>>>>>
>>>>>>
>>>>>>    On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <maraeo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>    Pushed, thanks. The transform feedback test still doesn't pass, but
>>>>>>> at
>>>>>>>
>>>>>>>> least
>>>>>>>> the hardlocks are gone.
>>>>>>>>
>>>>>>>>
>>>>>>>>   Thanks, I have looked into the other issue as well
>>>>>>> http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**>
>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/**
>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html>
>>>>>>>>
>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**>
>>>>>>> dev/2013-March/036941.html<htt**p://lists.freedesktop.org/**
>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> The problem arises when there are nested loops. If I rework the code
>>>>>>> so there are
>>>>>>> no nested loops the issue disappears. At least one pixel also needs to
>>>>>>> enter the
>>>>>>> outer loop. The pixels that should enter the outer loop behaves
>>>>>>> correctly. It is those
>>>>>>> pixels that should not enter the outer loop that misbehaves. It does
>>>>>>> not matter if they
>>>>>>> also fails the test for the inner loop, they will still execute the
>>>>>>> instruction inside. That
>>>>>>> leads to the strange results for that test.
>>>>>>>
>>>>>>>
>>>>>>>   Please test the attached patch.
>>>>>>
>>>>>> Vadim
>>>>>>
>>>>>>
>>>>>>    The strangeness is easier to see if the NUM_POINTS in the
>>>>>>
>>>>>>> ext_transform_feedback/
>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the code
>>>>>>> that fail the test
>>>>>>> and print starting_x, shift_reg_final and iteration_count.
>>>>>>>
>>>>>>> Marek, since you implemented transform feedback for r600, do you think
>>>>>>> the issue
>>>>>>> is with the tranform feedback code or the shader compiler or some
>>>>>>> other
>>>>>>> thing?
>>>>>>>
>>>>>>> //Martin
>>>>>>> ______________________________******_________________
>>>>>>> mesa-dev mailing list
>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev>
>>>>>>> <h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>> <htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   ______________________________****_________________
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev>
>>>>>> <htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the mesa-dev mailing list