[Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

Tue Mar 12 08:04:24 PDT 2013

On 12.03.2013 12:10, Christoph Bumiller wrote:
> On 12.03.2013 10:31, Christian König wrote:
>> Am 12.03.2013 02:48, schrieb Marek Olšák:
>>> On Mon, Mar 11, 2013 at 1:44 PM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Hi everybody,
>>>>
>>>> this problem has been open for quite some time now, with a bunch of
>>>> different
>>>> opinions and sometimes even patches floating on the list.
>>>>
>>>> The solutions proposed or implemented so far all more or less
>>>> incomplete, so
>>>> this approach was designed in mind with both completeness and
>>>> compatibility
>>>> with existing code.
>>>>
>>>> Over all it's just an implementation of what Tom Stellard named
>>>> solution #4 in
>>>> this eMail thread:
>>>> http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
>>> Hi Christian,
>>>
>>> this is definitely not the solution #4. According to the TGSI dump
>>> Christoph posted, it looks more like #3.
>> Well, for me the main difference between proposal #3 and #4 is that #3
>> tries to identify the declaration to use with the supplied "offset",
>> while #4 uses a completely distinct identifier for that.
>>
>>> The solution #4 completely changes the temporary file such that it
>>> becomes two-dimensional with the first index being a literal and the
>>> second index being either a literal or ADDR[literal], and it would
>>> always be like that regardless of whether drivers support that or not.
>>> One-dimensional indexing of TEMP is not allowed. For backward
>>> compatibility, the drivers that do not support it would only get a
>>> single array declaration TEMP[0][0..n] and TEMP[0][...] would be
>>> everywhere in the code.
>> Ok, then I misunderstood you a bit, but I don't think the difference
>> is so much.
>>
>> What I'm proposing is that we have an optional "ArrayID" attached to
>> each declaration and refer to this "ArrayID" in the indirect
>> addressing operand. To sum it up declarations should look something
>> like this:
>>
>> DCL TEMP[0..3]        // normal registers
>> DCL TEMP[1][4..11]    // indirectly accessed array
>> DCL TEMP[2][12..15]    // another indirectly accessed array
>> DCL TEMP[16..17] LOCAL    // local registers
>>
>> While an indirect operand might look like this:
>>
>> MOV TEMP[16], TEMP[1][ADDR[0].x-13]
>>
>> On the pro side for this approach is that it is compatible with all
>> the existing state trackers and driver, and we don't need to generate
>> different code depending on weather or not the driver supports this.
>>
>>> I don't know much about TGSI internals, so I can't review this. I'd
>>> just like to say that TGSI dumps should make sense (2D indexing should
>>> be only allowed with 2D declarations) and tgsi_text_translate should
>>> be able to do the reverse - convert the dumps back to TGSI tokens.
>> Completely agree with that, and beside writing documentation testing
>> this is still one of the todos with this patchset.
>>
>> I have to admit that your approach looks a bit cleaner from the high
>> above view. The problem with it is that it requires this additional 2D
>> index on every operand, and we just don't have enough bits left for
>> this. Even with my approach I need to make room for this ArrayID in
>> the indirect addressing operand token, and this additional token is
>> only there if the operand uses indirect adressing.
>>
>> Do you think we can live with my approach or is there any major
>> downside I currently don't see?
>>

One more thing. While you're at it (i.e. are familiar with the code),
could you set the UsageMask in the TGSI declaration so we can pack
scalar or vec2 arrays ?
Also, you could then declare gl_ClipDistance outputs as

DCL OUT[0..7].x, CLIPDIST

so we can actually index clip distances properly ?

With
DCL OUT[0..1].xyzw, CLIPDIST we can't really index the individual
components which leads to
if ((index & 3) == 0)
   MOV OUT[index / 4].x = value
else if ((index & 3) == 1)
   MOV OUT[index / 4].y = value

which is unnecessary on some hardware.

> I can live with it. I think ... (I hope I don't regret this later; seems
> like this doesn't contain less information, then it's ok.)
> If the placement of the hint index offends someone, just write it as
> "MOV TEMP[16], TEMP(1)[ADDR[0].x-13]" or ...
> TEMP[ADDR[0].x-13 : 1] or
> TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or
> something ... nicer.
>
> Actually ...
> if TEMP[0] is placed at mem[0]
> and TEMP[4..1] is placed at, say, mem[0x1000 in bytes]
>
> do I have to
> load $register mem[$addr - 0xd0] (no this can't work) or
> load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the
> offset) or
> load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added
> the base TEMP to the immediate offset)
>
> This needs to be documented as well.
>
>> Thanks for the clarification,
>> Christian.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>