[Mesa-dev] [PATCH 5/6] radeonsi: fix TEX writemask

Thu Aug 2 06:24:11 PDT 2012

On Thu, Aug 02, 2012 at 11:33:43AM +0200, Christian König wrote:
> On 02.08.2012 11:21, Michel Dänzer wrote:
> >On Don, 2012-08-02 at 11:05 +0200, Christian König wrote:
> >>On 02.08.2012 07:51, Michel Dänzer wrote:
> >>>On Mit, 2012-08-01 at 23:28 +0200, Christian König wrote:
> >>>>Using the writemask in the sampler results in packet
> >>>>VGPRS.
> >>>What does that mean?
> >>The instructions with a destination mask are packing their results, e.g.
> >>when you sample RGBA you get:
> >>R in VGPR0
> >>G in VGPR1
> >>B in VGPR2
> >>A in VGPR3
> >>
> >>But when you for example mask G&B you get:
> >>R in VGPR0
> >>G masked
> >>B masked
> >>A in VGPR1
> >>
> >>So your image sample instruction is only writing 2 VGPRS then.
> >Ah, so that should be spelled 'packed' then.
> Oh, going to fix that.
> >
> >
> >>>[SNIP]
> >>>Couldn't this incorrectly clobber components of the destination which
> >>>were supposed to be masked?
> >>No cause it is just an optimization of not fetching unwanted components,
> >>and not masking anything.
> >Hmm, but can't it happen that LLVM assigns destination GPRs containing
> >previous values that need to be preserved according to the TGSI
> >writemask?
> Not currently, as far as I can see, in opposition to the R600 target
> it always seems to allocate a new set of 4 registers and then picks
> the elements we wanted for the writemask separately.
> 

I haven't tested this much, but LLVM 3.2 is supposed to be much better
at optimizing the 'create vector' + 'select elements' case.  In theory,
it should be able to detect when elements of a vector are unused, and
then reallocate the unoccupied registers to other instructions.

Since SI is packing the unmasked components, we will probably need to
add special handling for these instructions, because the assumption
in the TGSI->LLVM code is that R,G,B,A values will be in vector slots
0,1,2,3 respectively.

> By the way I don't think vectors of VGPR registers needs to be
> aligned to their size, e.g. you can also do something like VGPR1_128
> in LLVM and it should work fine. But I'm not 100% sure about that.
>

>From what I've read, there are only alignment restrictions on SGPRs, so
I think VGPR1_128 would work.

-Tom