[Mesa-dev] [PATCH] r600g: Why all this fiddling with tgsi_helper_copy?

Christian König deathsimple at vodafone.de
Mon Dec 13 15:02:25 PST 2010


Am Montag, den 13.12.2010, 14:19 -0500 schrieb Jerome Glisse:
> 2010/12/13 Christian König <deathsimple at vodafone.de>:
> > tgsi_helper_copy is used on several occasions to copy a temporary result
> > into the real destination register to emulate writemasks for OP3 and
> > reduction operations. According to R600 ISA that's unnecessary.
> >
> 
> What in R600 ISA makes you think so (page) ? write mask is bit 4
> SQ_ALU_WORD1_OP2 but bit 4 of SQ_ALU_WORD1_OP3 is in SRC2_SEL field.
As long as we don't rely on PV or PS the instruction slot for the
SQ_ALU_WORD1_OP3 can just be omitted:

---
4.3 ALU Instruction Slots and Instruction Groups
....
The instructions in an instruction group must be in instruction slots 0 through 4,
in the order shown in Table 4.1. Up to four of the five instruction slots can be
omitted.
....

4.4 Assignment to ALU.[X,Y,Z,W] and ALU.Trans Units
....
After all instructions in the instruction group are processed, any ALU.[X,Y,Z,W] or
ALU.Trans operation that is unspecified implicitly executes a NOP instruction,
thus invalidating the values in the corresponding elements of the PV and PS
registers.
---

I tested it and at least on my RV710 this works as documented for MAD
and CMP instructions.

For DP4 (and probably all other reduction operations) it's not allowed
to writemask a specific component, but we can write the masked
components directly into the temp register instead of writing everything
into the temp register and then copy only the unmasked components to the
real destination:
---
4.9.2 Destination Registers
....
Instructions with two source operands have a write mask, WRITE_MASK, that
determines if the result is written to a GPR. The PV or PS registers result is
updated even if WRITE_MASK is 0. Instructions with three source operands have
no write mask; however, you can specify an out-of-bounds GPR destination to
inhibit their write. For example, if the thread is using four clause temporaries and
less than 124 GPRs, it is safe to use DST_GPR = 123 to ignore the result.
Otherwise, you must sacrifice one of the temporary GPRs for instructions with
three source operands. The PV or PS registers result is updated for instructions
with three source operands even if the destination GPR address is invalid.
---

To make my statement in the commit message more clear: I have ONLY read
the R600 ISA, nothing about R700 or evergreen. I'm not 100% sure that
those statements are also true for these chipsets. If you think that
this will work reliable in all circumstances I will try to optimize it
further, maybe getting rid of the temp register altogether.

Christian.



More information about the mesa-dev mailing list