[Mesa-dev] [PATCH] r600g: Add a Cayman specific version of UMAD
vadimgirlin at gmail.com
Sat Mar 30 17:08:13 PDT 2013
On 03/30/2013 05:35 AM, Martin Andersson wrote:
> I found an issue with the shader compiler for Cayman when I looked
> into why the ext_transform_feedback/order test case caused a GPU stall.
> It turned out the stall was an infinite loop that was the result of broken
> calculation in the shader function. The issue is that Cayman uses the
> tgsi_umad function for UMAD, but that does not work since it does not
> populate the y, z and w slots for UMUL that cayman requires.
> This patch implements a cayman_umad. There are some things I'm unsure of though.
> The UMUL for Cayman is compiled to, as far as I can tell, ALU_OP2_MULLO_INT and
> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the uint
> version in cayman_umad. In the patch I used the uint one, because that seemed
> the most logical.
Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK
MULLO_UINT should be used.
> The add part of UMAD I copied from tgsi_umad and that had a loop around the
> variable lasti, but the variable lasti is usally not used in cayman specific code.
The only difference with umad on cayman is in the mul part - each
MULLO_UINT should be expanded to 4 slots on cayman. Add part doesn't
need any changes.
> This is used in tgsi functions.
> int lasti = tgsi_last_instruction(inst->Dst.Register.WriteMask);
This is used to determine last written vector component from the write
mask, so that if tgsi instruction doesn't write e.g. W component, we
don't have to emit R600 instruction(s) for that component.
> But in cayman specific code this is used instead.
> int last_slot = (inst->Dst.Register.WriteMask & 0x8) ? 4 : 3;
This is used for instructions like RECIP_xxx (see the comment at
r600_shader.c:40) that should be expanded to 3 slots with optional 4th
slot if the write to the W component is required, but MULLO_UINT is
different - it should be expanded to 4 instruction slots always. By the
way, it seems cayman_mul_int_instr is incorrect in this regard.
> It does not work to switch lasti with last_slot, since that makes the loop run too
> many times (in my test case lasti is 0 and last_slot is 3). So I just removed the
> loop, was that correct or should i resolve that in some other way?
No, it's not correct, there should be a loop over the vector components
for addition as well - it should be performed in the same way as on the
pre-cayman chips. In your patch you are only performing the addition for
Basically, the only required change for UMAD on cayman is that you need
to expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots
> Martin Andersson (1):
> r600g: Add a Cayman specific version of UMAD
> src/gallium/drivers/r600/r600_shader.c | 47 +++++++++++++++++++++++++++++++++-
> 1 file changed, 46 insertions(+), 1 deletion(-)
More information about the mesa-dev