[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Christoph Bumiller e0425955 at student.tuwien.ac.at
Thu Dec 15 12:29:49 PST 2011


On 15.12.2011 21:10, Jose Fonseca wrote:
>
> ----- Original Message -----
>>
>> ----- Original Message -----
>>> On 15.12.2011 20:09, Jose Fonseca wrote:
>>>> ----- Original Message -----
>>>>> On 12/14/2011 12:58 AM, Ian Romanick wrote:
>>>>>> On 12/13/2011 01:25 PM, Jose Fonseca wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
>>>>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> This is an updated version of the patch set I sent to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> list
>>>>>>>>>>>>> a
>>>>>>>>>>>>> few
>>>>>>>>>>>>> hours
>>>>>>>>>>>>> ago.
>>>>>>>>>>>>> There is now a TGSI property called
>>>>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
>>>>>>>>>>>>> that drivers can use to determine how many of the 8
>>>>>>>>>>>>> available
>>>>>>>>>>>>> clip
>>>>>>>>>>>>> distances
>>>>>>>>>>>>> are actually used by a shader.
>>>>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be
>>>>>>>>>>>> easily
>>>>>>>>>>>> derived from the shader, and queried through
>>>>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
>>>>>>>>>>> No.  The clip distances can be indirectly addressed (there
>>>>>>>>>>> are
>>>>>>>>>>> up
>>>>>>>>>>> to 2
>>>>>>>>>>> of them in vec4 form for a total of 8 floats), which makes
>>>>>>>>>>> it
>>>>>>>>>>> impossible
>>>>>>>>>>> to determine which ones are used by analyzing the shader.
>>>>>>>>>> The description is almost complete. :)  The issue is that
>>>>>>>>>> the
>>>>>>>>>> shader
>>>>>>>>>> may
>>>>>>>>>> declare
>>>>>>>>>>
>>>>>>>>>> out float gl_ClipDistance[4];
>>>>>>>>>>
>>>>>>>>>> the use non-constant addressing of the array.  The compiler
>>>>>>>>>> knows
>>>>>>>>>> that
>>>>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc
>>>>>>>>>> analysis
>>>>>>>>>> would
>>>>>>>>>> not
>>>>>>>>>> be able to determine that.  Often the fixed-function
>>>>>>>>>> hardware
>>>>>>>>>> (see
>>>>>>>>>> below) needs to know which clip distance values are
>>>>>>>>>> actually
>>>>>>>>>> written.
>>>>>>>>> But don't all the clip distances written by the shader need
>>>>>>>>> to
>>>>>>>>> be
>>>>>>>>> declared?
>>>>>>>>>
>>>>>>>>> E.g.:
>>>>>>>>>
>>>>>>>>> DCL OUT[0], CLIPDIST[0]
>>>>>>>>> DCL OUT[1], CLIPDIST[1]
>>>>>>>>> DCL OUT[2], CLIPDIST[2]
>>>>>>>>> DCL OUT[3], CLIPDIST[3]
>>>>>>>>>
>>>>>>>>> therefore a trivial analysis of the declarations convey
>>>>>>>>> that?
>>>>>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but
>>>>>>>> it's
>>>>>>>> represented in the hardware as 2 vec4s.  You can tell by
>>>>>>>> analyzing
>>>>>>>> the
>>>>>>>> declarations whether there are more than 4 clip distances in
>>>>>>>> use,
>>>>>>>> but
>>>>>>>> not which components the shader writes to.
>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components
>>>>>>>> in
>>>>>>>> use,
>>>>>>>> not
>>>>>>>> the number of full vectors.
>>>>>>> Lets imagine
>>>>>>>
>>>>>>>    out float gl_ClipDistance[6];
>>>>>>>
>>>>>>> Each a clip distance is a scalar float.
>>>>>>>
>>>>>>> Either all hardware represents the 8 clip distances as two 4
>>>>>>> vectors,
>>>>>>> and we do:
>>>>>>>
>>>>>>>    DCL OUT[0].xywz, CLIPDIST[0]
>>>>>>>    DCL OUT[1].xy, CLIPDIST[1]
>>>>>>>
>>>>>>> using the full range of struct tgsi_declaration::UsageMask [1]
>>>>>>> or
>>>>>>> we
>>>>>>> represent them as as scalars:
>>>>>>>
>>>>>>>    DCL OUT[0].x, CLIPDIST[0]
>>>>>>>    DCL OUT[1].x, CLIPDIST[1]
>>>>>>>    DCL OUT[2].x, CLIPDIST[2]
>>>>>>>    DCL OUT[3].x, CLIPDIST[3]
>>>>>>>    DCL OUT[4].x, CLIPDIST[4]
>>>>>>>    DCL OUT[5].x, CLIPDIST[5]
>>>>>>>
>>>>>>> If indirect addressing is allowed as I read bore, then maybe
>>>>>>> the
>>>>>>> later
>>>>>>> is better.
>>>>>> As far as I'm aware, all hardware represents it as the former,
>>>>>> and
>>>>>> we
>>>>>> have a lowering pass to fix-up the float[] accesses to be
>>>>>> vec4[]
>>>>>> accesses.
>>>>> GeForce8+ = scalar architecture, no vectors, addresses are byte
>>>>> based,
>>>>> can access individual components just fine.
>>>> Ok. So we should avoid baking this vec4 assumption in TGSI
>>>> semantics.
>>>>
>>>>> Something like:
>>>>>
>>>>> gl_ClipDistance[i - 12] = some_value;
>>>>>
>>>>> DCL OUT[0].xyzw, POSITION
>>>>> DCL OUT[1-8].x, CLIPDIST[0-7]
>>>>>
>>>>> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
>>>>>         *              **
>>>>>
>>>>> *   - tgsi_dimension.Index specifying the base address by
>>>>> referencing
>>>>> a
>>>>> declaration
>>>>> **  - tgsi_src_register.Index
>>>>>
>>>>> is the only way I see to make this work nicely on all hardware.
>>>>> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned
>>>>> to
>>>>> contiguous hardware resources because of semantic.)
>>>> I think that having indexable temps, like D3D10, would be a
>>>> cleaner:
>>> The problem is that we need an indexable version of every file then
>>> (at
>>> least INPUT, OUTPUT), and then all the nice 32 bit structs break
>>> down
>>> when we get more than 16 files.
>>>
>>> D3D doesn't have these because indirect IN/OUT isn't allowed there,
>>> but
>>> it is in GL and the hardware can do it.
>> Indirect IN/OUT is allowed on D3D9 ,
>> http://msdn.microsoft.com/en-us/library/windows/desktop/bb172963%28v=vs.85%29.aspx
>> , but it looks like SM4 indeed doens't allow,
>>  http://msdn.microsoft.com/en-us/library/windows/desktop/ff471378%28v=VS.85%29.aspx
>> , which means that indirect input needs spilling the inputs into a
>> indexable temporary.
>>
>>> Also, having an indexable version of every file seems odd,
>>> especially
>>> since we need a way to distinguish individual arrays inside that
>>> file
>>> anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT
>>> we'll need 3 indices).
>> Fair enough.
>>
>>>>   DCL OUT[0].xyzw, POSITION
>>>>   DCL OUT[1][0-7].x, CLIPDIST[0-7]
>>>>
>>>>   MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx
>>>>
>>>> I propose we first add this new kind of temp at a first stage,
>>>> then
>>>> prohibit indirect addressing of all but this kind of temps.
>>> There's already TEMPORARY_ARRAY, but no one wants to use it because
>>> it's
>>> not clear how to distinguish individual arrays ...
>> There's a reference implementation in
>> src/gallium/auxiliary/tgsi/tgsi_exec.c but I can't find any use of
>> it outside.
> The commit that introduced TEMPORARY_ARRAY has bit more detail:
>
> commit 101f792a2af9c9a19a050afba8b60caa689466a5
> Author: Zack Rusin <zackr at vmware.com>
> Date:   Fri Jun 18 13:41:20 2010 -0400
>
>     gallium: add a temporary array register file
>     
>     like normal temporaries, but allows to define a number of distinct
>     arrays, all of which make it explicit that they contain /indexable/
>     registers.
>     as a side-effect we're adding support for multi-dimensional destination
>     registers.
>     The whole thing looks like this:
>     DCL TEMPX[0][0..128]  # 0 array with 128 registers
>     
>     ADD TEMPX[0][0], IN[0], IMM[0]
>     ADD TEMPX[0][1], IN[0], IMM[0]
>     ABS OUT[0], TEMPX[0][TEMP[0]]
>
> But if we'd follow this route, it's still not clear what to do for INPUT/OUTPUT/CNSTS, besides adding INPUTX,OUTPUTX,CONSTX?
>
> Another approach would be to make all registers bi-dimendional (as you were hinting), but make the second index optional for single element arrays.

For compatibility, we could keep incrementing the primary (offset) index:

DCL TEMP[0][0..7].xyzw
DCL TEMP[1][8..11].x
DCL TEMP[2][12..15].xy
DCL IN[0][VERTEXCOUNT][0].xyzw, POSITION <- what I meant with 3 indices,
GS or HS
DCL IN[1][VERTEXCOUNT][1].xyzw, COLOR0

So that drivers unaware of the changes could keep allocating vec4s
(right now an array of n indirectly accessed floats still has to be n
vec4 TEMPs)
for all the declarations and ignore the array index altogether.

> Jose



More information about the mesa-dev mailing list