[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Ian Romanick idr at freedesktop.org
Fri Dec 16 10:27:30 PST 2011


On 12/13/2011 05:08 PM, Christoph Bumiller wrote:
> On 12/14/2011 12:58 AM, Ian Romanick wrote:
>> On 12/13/2011 01:25 PM, Jose Fonseca wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
>>>>>>>> ----- Original Message -----
>>>>>>>>> This is an updated version of the patch set I sent to the list
>>>>>>>>> a
>>>>>>>>> few
>>>>>>>>> hours
>>>>>>>>> ago.
>>>>>>>>> There is now a TGSI property called
>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
>>>>>>>>> that drivers can use to determine how many of the 8 available
>>>>>>>>> clip
>>>>>>>>> distances
>>>>>>>>> are actually used by a shader.
>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be easily
>>>>>>>> derived from the shader, and queried through
>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
>>>>>>> No.  The clip distances can be indirectly addressed (there are up
>>>>>>> to 2
>>>>>>> of them in vec4 form for a total of 8 floats), which makes it
>>>>>>> impossible
>>>>>>> to determine which ones are used by analyzing the shader.
>>>>>> The description is almost complete. :)  The issue is that the
>>>>>> shader
>>>>>> may
>>>>>> declare
>>>>>>
>>>>>> out float gl_ClipDistance[4];
>>>>>>
>>>>>> the use non-constant addressing of the array.  The compiler knows
>>>>>> that
>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis
>>>>>> would
>>>>>> not
>>>>>> be able to determine that.  Often the fixed-function hardware (see
>>>>>> below) needs to know which clip distance values are actually
>>>>>> written.
>>>>> But don't all the clip distances written by the shader need to be
>>>>> declared?
>>>>>
>>>>> E.g.:
>>>>>
>>>>> DCL OUT[0], CLIPDIST[0]
>>>>> DCL OUT[1], CLIPDIST[1]
>>>>> DCL OUT[2], CLIPDIST[2]
>>>>> DCL OUT[3], CLIPDIST[3]
>>>>>
>>>>> therefore a trivial analysis of the declarations convey that?
>>>>
>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but it's
>>>> represented in the hardware as 2 vec4s.  You can tell by analyzing
>>>> the
>>>> declarations whether there are more than 4 clip distances in use, but
>>>> not which components the shader writes to.
>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in use,
>>>> not
>>>> the number of full vectors.
>>>
>>> Lets imagine
>>>
>>>     out float gl_ClipDistance[6];
>>>
>>> Each a clip distance is a scalar float.
>>>
>>> Either all hardware represents the 8 clip distances as two 4 vectors,
>>> and we do:
>>>
>>>     DCL OUT[0].xywz, CLIPDIST[0]
>>>     DCL OUT[1].xy, CLIPDIST[1]
>>>
>>> using the full range of struct tgsi_declaration::UsageMask [1] or we
>>> represent them as as scalars:
>>>
>>>     DCL OUT[0].x, CLIPDIST[0]
>>>     DCL OUT[1].x, CLIPDIST[1]
>>>     DCL OUT[2].x, CLIPDIST[2]
>>>     DCL OUT[3].x, CLIPDIST[3]
>>>     DCL OUT[4].x, CLIPDIST[4]
>>>     DCL OUT[5].x, CLIPDIST[5]
>>>
>>> If indirect addressing is allowed as I read bore, then maybe the later
>>> is better.
>>
>> As far as I'm aware, all hardware represents it as the former, and we
>> have a lowering pass to fix-up the float[] accesses to be vec4[] accesses.
>
> GeForce8+ = scalar architecture, no vectors, addresses are byte based,
> can access individual components just fine.
>
> Something like:
>
> gl_ClipDistance[i - 12] = some_value;
>
> DCL OUT[0].xyzw, POSITION
> DCL OUT[1-8].x, CLIPDIST[0-7]
>
> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
>          *              **
>
> *   - tgsi_dimension.Index specifying the base address by referencing a
> declaration
> **  - tgsi_src_register.Index
>
> is the only way I see to make this work nicely on all hardware.
>
> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned to
> contiguous hardware resources because of semantic.)
>
> For constrained hardware the driver can build the clunky
>
> c := ADDR[0].x % 4
> i := ADDR[0].x / 4
> IF [c == 0]
>    MOV OUT[i].x, TEMP[0].xxxx
> ELSE
> IF [c == 1]
>    MOV OUT[i].y, TEMP[0].xxxx
> ELSE
> IF [c == 2]
>    MOV OUT[i].z, TEMP[0].xxxx
> ELSE
>    MOV OUT[i].w, TEMP[0].xxxx
> ENDIF
>
> itself.

Doing it at that low-level has a number of significant drawbacks.  The 
worst is that it's long after any high-level optimizations can be done 
on the code.  It also means that it has to be reimplemented in every 
driver that needs.  This really belongs at a higher level in the code.

Note that lowering pass that already exists changes the accesses to 
'float gl_ClipDistance[8]' to 'vec4 gl_ClipDistanceMESA[2]'.  Is there a 
compelling reason to not do the same at the lower level?


More information about the mesa-dev mailing list