[Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads

Ilia Mirkin imirkin at alum.mit.edu
Fri Apr 8 15:45:23 UTC 2016


On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> When dealing with non vector variables the llvm register allocator
> will use TEMP[0].x then TEMP[0].y, etc.
>
> When loading something from a global buffer it will calculate the
> address to use, and store that in say TEMP[0].x, so it ends up
> generating:
>
> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>
> Expecting the contents of TEMP[0].y to become the 32 bits of data
> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
> data at address (TEMP[0].x + 4).
>
> With the old RES[32767] code one could generate the following TGSI:
>
> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>
> And things would work fine since the .xxxx swizzling postfix would
> be honored and when storing to y (the only component set in the dest-mask)
> the x component at address (TEMP[0].x) would be loaded, rather then the
> y component at (TEMP[0].y)
>
> Note that another approach would be to not increment the address by
> a 32 bit word for skipped (not set in destmask) components.
>
> The way I see it either:
>
> 1) We see that LOAD does not deal with vectors, but with flat memory,
> in which case skipping 4 bytes because x is not set in the destmask
> does not make sense, as that is a vector thing todo.
>
> 2) LOAD is vector layout aware in which case supporting swizzling
> makes sense.
>
> Currently we have a weird hybrid which is rather cumbersome to
> work with from a compiler pov.

And I guess LLVM never ends up generating any of the other "funny"
instructions like LIT and the such. Well, I have no problem adding the
swizzling logic, i.e. the way that LOAD will now work (logically) is
that it will

(a) fetch 4 values from the coordinates provided (4 sequential dwords
from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
the case of images)
(b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
(c) store that swizzled result into the destination based on the writemask

That would sound reasonable to me, and if I understand correctly, is
option 2 of your proposal. We'd need some docs updates and buy-in from
the other gallium driver developers.

STORE remains unchanged, as the MEMORY/etc is in the destination,
where there is a writemask, which is presently used and will remain
effective.

  -ilia


More information about the Nouveau mailing list