[Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads

Fri Apr 8 16:26:17 UTC 2016

Hi,

On 08-04-16 18:06, Hans de Goede wrote:
> Hi,
>
> On 08-04-16 17:45, Ilia Mirkin wrote:
>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede at redhat.com> wrote:
>>> When dealing with non vector variables the llvm register allocator
>>> will use TEMP[0].x then TEMP[0].y, etc.
>>>
>>> When loading something from a global buffer it will calculate the
>>> address to use, and store that in say TEMP[0].x, so it ends up
>>> generating:
>>>
>>> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>>>
>>> Expecting the contents of TEMP[0].y to become the 32 bits of data
>>> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
>>> data at address (TEMP[0].x + 4).
>>>
>>> With the old RES[32767] code one could generate the following TGSI:
>>>
>>> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>>>
>>> And things would work fine since the .xxxx swizzling postfix would
>>> be honored and when storing to y (the only component set in the dest-mask)
>>> the x component at address (TEMP[0].x) would be loaded, rather then the
>>> y component at (TEMP[0].y)
>>>
>>> Note that another approach would be to not increment the address by
>>> a 32 bit word for skipped (not set in destmask) components.
>>>
>>> The way I see it either:
>>>
>>> 1) We see that LOAD does not deal with vectors, but with flat memory,
>>> in which case skipping 4 bytes because x is not set in the destmask
>>> does not make sense, as that is a vector thing todo.
>>>
>>> 2) LOAD is vector layout aware in which case supporting swizzling
>>> makes sense.
>>>
>>> Currently we have a weird hybrid which is rather cumbersome to
>>> work with from a compiler pov.
>>
>> And I guess LLVM never ends up generating any of the other "funny"
>> instructions like LIT and the such. Well, I have no problem adding the
>> swizzling logic, i.e. the way that LOAD will now work (logically) is
>> that it will
>>
>> (a) fetch 4 values from the coordinates provided (4 sequential dwords
>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
>> the case of images)
>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
>> (c) store that swizzled result into the destination based on the writemask
>>
>> That would sound reasonable to me, and if I understand correctly, is
>> option 2 of your proposal.
>
> Yes that is option 2, and is basically what the patch which started this
> thread does. So that would work for me :)
>
>> We'd need some docs updates and buy-in from the other gallium driver developers.
>
> What docs would need updating ? The TGSI docs I'm aware of are at:
>
> http://gallium.readthedocs.org/en/latest/tgsi.html
>
> I assume those have a source in the mesa src somewhere (I've not looked),
> but those mostly just look quite incomplete in general when it comes to TGSI
> (I've had to revert to figuring what things do from the mesa srcs quite often)
>
> Have I been looking at the wrong docs perhaps ?
>
> Note that them being incomplete is not intended as an excuse to not document
> this, I'm all for better documentation.
>
>> STORE remains unchanged, as the MEMORY/etc is in the destination,
>> where there is a writemask, which is presently used and will remain
>> effective.
>
> Right and note that the first src operand of STORE already takes swizzling
> into account, so the proposed option 2 will actually make the 2 more inline.

Erm, I mean the 2nd src operand of the store of-course, the actual src.

On a related note, comparing handleLOAD and handleSTORE, this bit in
handleLOAD seems wrong:

          Value *off = fetchSrc(1, c);

I believe that should be:

	 Value *off = fetchSrc(1, 0);

Just like handleSTORE does:

	 off = fetchSrc(0, 0);

And always using a 'c' of 0 seems correct here since we are dealing
with an address.

Once I know which docs to update for this, I'll do a v2 of this patch
and add a preparation patch fixing the above to the v2 set.

Regards,

Hans