[Mesa-dev] [PATCH] r600g: Why all this fiddling with tgsi_helper_copy?
alexdeucher at gmail.com
Mon Dec 13 15:28:50 PST 2010
2010/12/13 Christian König <deathsimple at vodafone.de>:
> Am Montag, den 13.12.2010, 14:19 -0500 schrieb Jerome Glisse:
>> 2010/12/13 Christian König <deathsimple at vodafone.de>:
>> > tgsi_helper_copy is used on several occasions to copy a temporary result
>> > into the real destination register to emulate writemasks for OP3 and
>> > reduction operations. According to R600 ISA that's unnecessary.
>> What in R600 ISA makes you think so (page) ? write mask is bit 4
>> SQ_ALU_WORD1_OP2 but bit 4 of SQ_ALU_WORD1_OP3 is in SRC2_SEL field.
> As long as we don't rely on PV or PS the instruction slot for the
> SQ_ALU_WORD1_OP3 can just be omitted:
> 4.3 ALU Instruction Slots and Instruction Groups
> The instructions in an instruction group must be in instruction slots 0 through 4,
> in the order shown in Table 4.1. Up to four of the five instruction slots can be
> 4.4 Assignment to ALU.[X,Y,Z,W] and ALU.Trans Units
> After all instructions in the instruction group are processed, any ALU.[X,Y,Z,W] or
> ALU.Trans operation that is unspecified implicitly executes a NOP instruction,
> thus invalidating the values in the corresponding elements of the PV and PS
> I tested it and at least on my RV710 this works as documented for MAD
> and CMP instructions.
> For DP4 (and probably all other reduction operations) it's not allowed
> to writemask a specific component, but we can write the masked
> components directly into the temp register instead of writing everything
> into the temp register and then copy only the unmasked components to the
> real destination:
FWIW, write masks seem to work just fine with the DOT4 instructions.
The only stipulation is that you have to use every instruction slot
when you use DOT4. The result is replicated to all channels of the
dst reg and only written to the masked ones.
> 4.9.2 Destination Registers
> Instructions with two source operands have a write mask, WRITE_MASK, that
> determines if the result is written to a GPR. The PV or PS registers result is
> updated even if WRITE_MASK is 0. Instructions with three source operands have
> no write mask; however, you can specify an out-of-bounds GPR destination to
> inhibit their write. For example, if the thread is using four clause temporaries and
> less than 124 GPRs, it is safe to use DST_GPR = 123 to ignore the result.
> Otherwise, you must sacrifice one of the temporary GPRs for instructions with
> three source operands. The PV or PS registers result is updated for instructions
> with three source operands even if the destination GPR address is invalid.
> To make my statement in the commit message more clear: I have ONLY read
> the R600 ISA, nothing about R700 or evergreen. I'm not 100% sure that
> those statements are also true for these chipsets. If you think that
> this will work reliable in all circumstances I will try to optimize it
> further, maybe getting rid of the temp register altogether.
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
More information about the mesa-dev