[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

Sun Mar 31 02:01:31 PDT 2013

On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
> On 03/30/2013 05:35 AM, Martin Andersson wrote:
>>
>> I found an issue with the shader compiler for Cayman when I looked
>> into why the ext_transform_feedback/order test case caused a GPU stall.
>> It turned out the stall was an infinite loop that was the result of broken
>> calculation in the shader function. The issue is that Cayman uses the
>> tgsi_umad function for UMAD, but that does not work since it does not
>> populate the y, z and w slots for UMUL that cayman requires.
>>
>> This patch implements a cayman_umad. There are some things I'm unsure of
>> though.
>>
>> The UMUL for Cayman is compiled to, as far as I can tell,
>> ALU_OP2_MULLO_INT and
>> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the
>> uint
>> version in cayman_umad. In the patch I used the uint one, because that
>> seemed
>> the most logical.
>
>
> Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
> MULLO_UINT should be used.

Ok, I will send a patch for that as well then.

>
>>
>> The add part of UMAD I copied from tgsi_umad and that had a loop around
>> the
>> variable lasti, but the variable lasti is usally not used in cayman
>> specific code.
>>
>
> The only difference with umad on cayman is in the mul part - each MULLO_UINT
> should be expanded to 4 slots on cayman. Add part doesn't need any changes.
>
>
>> This is used in tgsi functions.
>> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>
>
> This is used to determine last written vector component from the write mask,
> so that if tgsi instruction doesn't write e.g. W component, we don't have to
> emit R600 instruction(s) for that component.
>
>
>>
>> But in cayman specific code this is used instead.
>> int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;
>
>
> This is used for instructions like RECIP_xxx (see the comment at
> r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot
> if the write to the W component is required, but MULLO_UINT is different -
> it should be expanded to 4 instruction slots always. By the way, it seems
> cayman_mul_int_instr is incorrect in this regard.
>
>
>>
>> It does not work to switch lasti with last_slot, since that makes the loop
>> run too
>> many times (in my test case lasti is 0 and last_slot is 3). So I just
>> removed the
>> loop, was that correct or should i resolve that in some other way?
>
>
> No, it's not correct, there should be a loop over the vector components for
> addition as well - it should be performed in the same way as on the
> pre-cayman chips. In your patch you are only performing the addition for one
> component.
>
> Basically, the only required change for UMAD on cayman is that you need to
> expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on
> cayman.

Should I keep the cayman_umad function or should I modify tgsi_umad and
add the cayman specific part there?

> Vadim
>
>
>>
>> Martin Andersson (1):
>>    r600g: Add a Cayman specific version of UMAD
>>
>>   src/gallium/drivers/r600/r600_shader.c | 47
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 46 insertions(+), 1 deletion(-)
>>
>

//Martin