[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD
Vadim Girlin
vadimgirlin at gmail.com
Sun Mar 31 12:18:06 PDT 2013
On 03/31/2013 01:01 PM, Martin Andersson wrote:
> On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> On 03/30/2013 05:35 AM, Martin Andersson wrote:
>>>
>>> I found an issue with the shader compiler for Cayman when I looked
>>> into why the ext_transform_feedback/order test case caused a GPU stall.
>>> It turned out the stall was an infinite loop that was the result of broken
>>> calculation in the shader function. The issue is that Cayman uses the
>>> tgsi_umad function for UMAD, but that does not work since it does not
>>> populate the y, z and w slots for UMUL that cayman requires.
>>>
>>> This patch implements a cayman_umad. There are some things I'm unsure of
>>> though.
>>>
>>> The UMUL for Cayman is compiled to, as far as I can tell,
>>> ALU_OP2_MULLO_INT and
>>> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the
>>> uint
>>> version in cayman_umad. In the patch I used the uint one, because that
>>> seemed
>>> the most logical.
>>
>>
>> Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
>> MULLO_UINT should be used.
>
> Ok, I will send a patch for that as well then.
>
>>
>>>
>>> The add part of UMAD I copied from tgsi_umad and that had a loop around
>>> the
>>> variable lasti, but the variable lasti is usally not used in cayman
>>> specific code.
>>>
>>
>> The only difference with umad on cayman is in the mul part - each MULLO_UINT
>> should be expanded to 4 slots on cayman. Add part doesn't need any changes.
>>
>>
>>> This is used in tgsi functions.
>>> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>>
>>
>> This is used to determine last written vector component from the write mask,
>> so that if tgsi instruction doesn't write e.g. W component, we don't have to
>> emit R600 instruction(s) for that component.
>>
>>
>>>
>>> But in cayman specific code this is used instead.
>>> int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;
>>
>>
>> This is used for instructions like RECIP_xxx (see the comment at
>> r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot
>> if the write to the W component is required, but MULLO_UINT is different -
>> it should be expanded to 4 instruction slots always. By the way, it seems
>> cayman_mul_int_instr is incorrect in this regard.
>>
>>
>>>
>>> It does not work to switch lasti with last_slot, since that makes the loop
>>> run too
>>> many times (in my test case lasti is 0 and last_slot is 3). So I just
>>> removed the
>>> loop, was that correct or should i resolve that in some other way?
>>
>>
>> No, it's not correct, there should be a loop over the vector components for
>> addition as well - it should be performed in the same way as on the
>> pre-cayman chips. In your patch you are only performing the addition for one
>> component.
>>
>> Basically, the only required change for UMAD on cayman is that you need to
>> expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on
>> cayman.
>
> Should I keep the cayman_umad function or should I modify tgsi_umad and
> add the cayman specific part there?
I think it's better to modify tgsi_umad (to avoid unnecessary code
duplication).
Vadim
>
>> Vadim
>>
>>
>>>
>>> Martin Andersson (1):
>>> r600g: Add a Cayman specific version of UMAD
>>>
>>> src/gallium/drivers/r600/r600_shader.c | 47
>>> +++++++++++++++++++++++++++++++++-
>>> 1 file changed, 46 insertions(+), 1 deletion(-)
>>>
>>
>
> //Martin
>
More information about the mesa-dev
mailing list