[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Thu Dec 15 11:30:37 PST 2011

On 15.12.2011 20:09, Jose Fonseca wrote:
> ----- Original Message -----
>> On 12/14/2011 12:58 AM, Ian Romanick wrote:
>>> On 12/13/2011 01:25 PM, Jose Fonseca wrote:
>>>>
>>>> ----- Original Message -----
>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
>>>>>> ----- Original Message -----
>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> This is an updated version of the patch set I sent to the
>>>>>>>>>> list
>>>>>>>>>> a
>>>>>>>>>> few
>>>>>>>>>> hours
>>>>>>>>>> ago.
>>>>>>>>>> There is now a TGSI property called
>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
>>>>>>>>>> that drivers can use to determine how many of the 8
>>>>>>>>>> available
>>>>>>>>>> clip
>>>>>>>>>> distances
>>>>>>>>>> are actually used by a shader.
>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be easily
>>>>>>>>> derived from the shader, and queried through
>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
>>>>>>>> No.  The clip distances can be indirectly addressed (there are
>>>>>>>> up
>>>>>>>> to 2
>>>>>>>> of them in vec4 form for a total of 8 floats), which makes it
>>>>>>>> impossible
>>>>>>>> to determine which ones are used by analyzing the shader.
>>>>>>> The description is almost complete. :)  The issue is that the
>>>>>>> shader
>>>>>>> may
>>>>>>> declare
>>>>>>>
>>>>>>> out float gl_ClipDistance[4];
>>>>>>>
>>>>>>> the use non-constant addressing of the array.  The compiler
>>>>>>> knows
>>>>>>> that
>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis
>>>>>>> would
>>>>>>> not
>>>>>>> be able to determine that.  Often the fixed-function hardware
>>>>>>> (see
>>>>>>> below) needs to know which clip distance values are actually
>>>>>>> written.
>>>>>> But don't all the clip distances written by the shader need to
>>>>>> be
>>>>>> declared?
>>>>>>
>>>>>> E.g.:
>>>>>>
>>>>>> DCL OUT[0], CLIPDIST[0]
>>>>>> DCL OUT[1], CLIPDIST[1]
>>>>>> DCL OUT[2], CLIPDIST[2]
>>>>>> DCL OUT[3], CLIPDIST[3]
>>>>>>
>>>>>> therefore a trivial analysis of the declarations convey that?
>>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but
>>>>> it's
>>>>> represented in the hardware as 2 vec4s.  You can tell by
>>>>> analyzing
>>>>> the
>>>>> declarations whether there are more than 4 clip distances in use,
>>>>> but
>>>>> not which components the shader writes to.
>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in
>>>>> use,
>>>>> not
>>>>> the number of full vectors.
>>>> Lets imagine
>>>>
>>>>    out float gl_ClipDistance[6];
>>>>
>>>> Each a clip distance is a scalar float.
>>>>
>>>> Either all hardware represents the 8 clip distances as two 4
>>>> vectors,
>>>> and we do:
>>>>
>>>>    DCL OUT[0].xywz, CLIPDIST[0]
>>>>    DCL OUT[1].xy, CLIPDIST[1]
>>>>
>>>> using the full range of struct tgsi_declaration::UsageMask [1] or
>>>> we
>>>> represent them as as scalars:
>>>>
>>>>    DCL OUT[0].x, CLIPDIST[0]
>>>>    DCL OUT[1].x, CLIPDIST[1]
>>>>    DCL OUT[2].x, CLIPDIST[2]
>>>>    DCL OUT[3].x, CLIPDIST[3]
>>>>    DCL OUT[4].x, CLIPDIST[4]
>>>>    DCL OUT[5].x, CLIPDIST[5]
>>>>
>>>> If indirect addressing is allowed as I read bore, then maybe the
>>>> later
>>>> is better.
>>> As far as I'm aware, all hardware represents it as the former, and
>>> we
>>> have a lowering pass to fix-up the float[] accesses to be vec4[]
>>> accesses.
>> GeForce8+ = scalar architecture, no vectors, addresses are byte
>> based,
>> can access individual components just fine.
> Ok. So we should avoid baking this vec4 assumption in TGSI semantics.
>
>> Something like:
>>
>> gl_ClipDistance[i - 12] = some_value;
>>
>> DCL OUT[0].xyzw, POSITION
>> DCL OUT[1-8].x, CLIPDIST[0-7]
>>
>> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
>>         *              **
>>
>> *   - tgsi_dimension.Index specifying the base address by referencing
>> a
>> declaration
>> **  - tgsi_src_register.Index
>>
>> is the only way I see to make this work nicely on all hardware. 
>> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned to
>> contiguous hardware resources because of semantic.)
> I think that having indexable temps, like D3D10, would be a cleaner:

The problem is that we need an indexable version of every file then (at
least INPUT, OUTPUT), and then all the nice 32 bit structs break down
when we get more than 16 files.

D3D doesn't have these because indirect IN/OUT isn't allowed there, but
it is in GL and the hardware can do it.

Also, having an indexable version of every file seems odd, especially
since we need a way to distinguish individual arrays inside that file
anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT
we'll need 3 indices).

>   DCL OUT[0].xyzw, POSITION
>   DCL OUT[1][0-7].x, CLIPDIST[0-7]
>
>   MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx
>
> I propose we first add this new kind of temp at a first stage, then prohibit indirect addressing of all but this kind of temps.

There's already TEMPORARY_ARRAY, but no one wants to use it because it's
not clear how to distinguish individual arrays ...

>> For constrained hardware the driver can build the clunky
>>
>> c := ADDR[0].x % 4
>> i := ADDR[0].x / 4
>> IF [c == 0]
>>   MOV OUT[i].x, TEMP[0].xxxx
>> ELSE
>> IF [c == 1]
>>   MOV OUT[i].y, TEMP[0].xxxx
>> ELSE
>> IF [c == 2]
>>   MOV OUT[i].z, TEMP[0].xxxx
>> ELSE
>>   MOV OUT[i].w, TEMP[0].xxxx
>> ENDIF
>>
>> itself.
> Sounds good plan to me.
>
> BTW, I took a look at inputs/outputs UsageMasks and although we don't use them, I really think we really should, as having that info readily accessible would allow to avoid wasting time/bandwidth copying attributes which are not needed.

Yes I'm already computing these masks in my TGSI parser, getting them
from higher up would be nicer and potentially more reliable.

> Jose