[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD

Vadim Girlin vadimgirlin at gmail.com
Sat Mar 30 17:08:13 PDT 2013

On 03/30/2013 05:35 AM, Martin Andersson wrote:
> I found an issue with the shader compiler for Cayman when I looked
> into why the ext_transform_feedback/order test case caused a GPU stall.
> It turned out the stall was an infinite loop that was the result of broken
> calculation in the shader function. The issue is that Cayman uses the
> tgsi_umad function for UMAD, but that does not work since it does not
> populate the y, z and w slots for UMUL that cayman requires.
> This patch implements a cayman_umad. There are some things I'm unsure of though.
> The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and
> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint
> version in cayman_umad. In the patch I used the uint one, because that seemed
> the most logical.

Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK 
MULLO_UINT should be used.

> The add part of UMAD I copied from tgsi_umad and that had a loop around the
> variable lasti, but the variable lasti is usally not used in cayman specific code.

The only difference with umad on cayman is in the mul part - each 
MULLO_UINT should be expanded to 4 slots on cayman. Add part doesn't 
need any changes.

> This is used in tgsi functions.
> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);

This is used to determine last written vector component from the write 
mask, so that if tgsi instruction doesn't write e.g. W component, we 
don't have to emit R600 instruction(s) for that component.

> But in cayman specific code this is used instead.
> int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3;

This is used for instructions like RECIP_xxx (see the comment at 
r600_shader.c:40) that should be expanded to 3 slots with optional 4th 
slot if the write to the W component is required, but MULLO_UINT is 
different - it should be expanded to 4 instruction slots always. By the 
way, it seems cayman_mul_int_instr is incorrect in this regard.

> It does not work to switch lasti with last_slot, since that makes the loop run too
> many times (in my test case lasti is 0 and last_slot is 3). So I just removed the
> loop, was that correct or should i resolve that in some other way?

No, it's not correct, there should be a loop over the vector components 
for addition as well - it should be performed in the same way as on the 
pre-cayman chips. In your patch you are only performing the addition for 
one component.

Basically, the only required change for UMAD on cayman is that you need 
to expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots 
on cayman.


> Martin Andersson (1):
>    r600g: Add a Cayman specific version of UMAD
>   src/gallium/drivers/r600/r600_shader.c | 47 +++++++++++++++++++++++++++++++++-
>   1 file changed, 46 insertions(+), 1 deletion(-)

More information about the mesa-dev mailing list