[Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads

Ilia Mirkin imirkin at alum.mit.edu
Fri Apr 8 16:34:16 UTC 2016


On Fri, Apr 8, 2016 at 12:26 PM, Hans de Goede <hdegoede at redhat.com> wrote:
> Hi,
>
>
> On 08-04-16 18:06, Hans de Goede wrote:
>>
>> Hi,
>>
>> On 08-04-16 17:45, Ilia Mirkin wrote:
>>>
>>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede at redhat.com>
>>> wrote:
>>>>
>>>> When dealing with non vector variables the llvm register allocator
>>>> will use TEMP[0].x then TEMP[0].y, etc.
>>>>
>>>> When loading something from a global buffer it will calculate the
>>>> address to use, and store that in say TEMP[0].x, so it ends up
>>>> generating:
>>>>
>>>> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>>>>
>>>> Expecting the contents of TEMP[0].y to become the 32 bits of data
>>>> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
>>>> data at address (TEMP[0].x + 4).
>>>>
>>>> With the old RES[32767] code one could generate the following TGSI:
>>>>
>>>> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>>>>
>>>> And things would work fine since the .xxxx swizzling postfix would
>>>> be honored and when storing to y (the only component set in the
>>>> dest-mask)
>>>> the x component at address (TEMP[0].x) would be loaded, rather then the
>>>> y component at (TEMP[0].y)
>>>>
>>>> Note that another approach would be to not increment the address by
>>>> a 32 bit word for skipped (not set in destmask) components.
>>>>
>>>> The way I see it either:
>>>>
>>>> 1) We see that LOAD does not deal with vectors, but with flat memory,
>>>> in which case skipping 4 bytes because x is not set in the destmask
>>>> does not make sense, as that is a vector thing todo.
>>>>
>>>> 2) LOAD is vector layout aware in which case supporting swizzling
>>>> makes sense.
>>>>
>>>> Currently we have a weird hybrid which is rather cumbersome to
>>>> work with from a compiler pov.
>>>
>>>
>>> And I guess LLVM never ends up generating any of the other "funny"
>>> instructions like LIT and the such. Well, I have no problem adding the
>>> swizzling logic, i.e. the way that LOAD will now work (logically) is
>>> that it will
>>>
>>> (a) fetch 4 values from the coordinates provided (4 sequential dwords
>>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
>>> the case of images)
>>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE
>>> argument
>>> (c) store that swizzled result into the destination based on the
>>> writemask
>>>
>>> That would sound reasonable to me, and if I understand correctly, is
>>> option 2 of your proposal.
>>
>>
>> Yes that is option 2, and is basically what the patch which started this
>> thread does. So that would work for me :)
>>
>>> We'd need some docs updates and buy-in from the other gallium driver
>>> developers.
>>
>>
>> What docs would need updating ? The TGSI docs I'm aware of are at:
>>
>> http://gallium.readthedocs.org/en/latest/tgsi.html
>>
>> I assume those have a source in the mesa src somewhere (I've not looked),
>> but those mostly just look quite incomplete in general when it comes to
>> TGSI
>> (I've had to revert to figuring what things do from the mesa srcs quite
>> often)
>>
>> Have I been looking at the wrong docs perhaps ?
>>
>> Note that them being incomplete is not intended as an excuse to not
>> document
>> this, I'm all for better documentation.
>>
>>> STORE remains unchanged, as the MEMORY/etc is in the destination,
>>> where there is a writemask, which is presently used and will remain
>>> effective.
>>
>>
>> Right and note that the first src operand of STORE already takes swizzling
>> into account, so the proposed option 2 will actually make the 2 more
>> inline.
>
>
> Erm, I mean the 2nd src operand of the store of-course, the actual src.
>
> On a related note, comparing handleLOAD and handleSTORE, this bit in
> handleLOAD seems wrong:
>
>          Value *off = fetchSrc(1, c);
>
> I believe that should be:
>
>          Value *off = fetchSrc(1, 0);

Yep, that's wrong. I think I was waffling back and forth early on in
the lifetime of the patchset about how it would work, and one of the
options was to read each dword from a separate offset. (I think I
started implementing atomic buffers well over a year ago, only to be
stymied by the memory window issue and give up for a long time.) I
eventually came to realize that was insanity.

>
> Just like handleSTORE does:
>
>          off = fetchSrc(0, 0);
>
> And always using a 'c' of 0 seems correct here since we are dealing
> with an address.
>
> Once I know which docs to update for this, I'll do a v2 of this patch
> and add a preparation patch fixing the above to the v2 set.

src/gallium/docs/source/tgsi.rst

There are push hooks which make readthedocs.org re-pull from the mesa
repo on every push so that things are up to date (well, it takes a few
minutes to regenerate the html).

  -ilia


More information about the Nouveau mailing list