[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Fri Dec 16 11:42:06 PST 2011

On 16.12.2011 19:27, Ian Romanick wrote:
> On 12/13/2011 05:08 PM, Christoph Bumiller wrote:
>> On 12/14/2011 12:58 AM, Ian Romanick wrote:
>>> On 12/13/2011 01:25 PM, Jose Fonseca wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> This is an updated version of the patch set I sent to the list
>>>>>>>>>> a
>>>>>>>>>> few
>>>>>>>>>> hours
>>>>>>>>>> ago.
>>>>>>>>>> There is now a TGSI property called
>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
>>>>>>>>>> that drivers can use to determine how many of the 8 available
>>>>>>>>>> clip
>>>>>>>>>> distances
>>>>>>>>>> are actually used by a shader.
>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be easily
>>>>>>>>> derived from the shader, and queried through
>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
>>>>>>>> No.  The clip distances can be indirectly addressed (there are up
>>>>>>>> to 2
>>>>>>>> of them in vec4 form for a total of 8 floats), which makes it
>>>>>>>> impossible
>>>>>>>> to determine which ones are used by analyzing the shader.
>>>>>>> The description is almost complete. :)  The issue is that the
>>>>>>> shader
>>>>>>> may
>>>>>>> declare
>>>>>>>
>>>>>>> out float gl_ClipDistance[4];
>>>>>>>
>>>>>>> the use non-constant addressing of the array.  The compiler knows
>>>>>>> that
>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis
>>>>>>> would
>>>>>>> not
>>>>>>> be able to determine that.  Often the fixed-function hardware (see
>>>>>>> below) needs to know which clip distance values are actually
>>>>>>> written.
>>>>>> But don't all the clip distances written by the shader need to be
>>>>>> declared?
>>>>>>
>>>>>> E.g.:
>>>>>>
>>>>>> DCL OUT[0], CLIPDIST[0]
>>>>>> DCL OUT[1], CLIPDIST[1]
>>>>>> DCL OUT[2], CLIPDIST[2]
>>>>>> DCL OUT[3], CLIPDIST[3]
>>>>>>
>>>>>> therefore a trivial analysis of the declarations convey that?
>>>>>
>>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but it's
>>>>> represented in the hardware as 2 vec4s.  You can tell by analyzing
>>>>> the
>>>>> declarations whether there are more than 4 clip distances in use, but
>>>>> not which components the shader writes to.
>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in use,
>>>>> not
>>>>> the number of full vectors.
>>>>
>>>> Lets imagine
>>>>
>>>>     out float gl_ClipDistance[6];
>>>>
>>>> Each a clip distance is a scalar float.
>>>>
>>>> Either all hardware represents the 8 clip distances as two 4 vectors,
>>>> and we do:
>>>>
>>>>     DCL OUT[0].xywz, CLIPDIST[0]
>>>>     DCL OUT[1].xy, CLIPDIST[1]
>>>>
>>>> using the full range of struct tgsi_declaration::UsageMask [1] or we
>>>> represent them as as scalars:
>>>>
>>>>     DCL OUT[0].x, CLIPDIST[0]
>>>>     DCL OUT[1].x, CLIPDIST[1]
>>>>     DCL OUT[2].x, CLIPDIST[2]
>>>>     DCL OUT[3].x, CLIPDIST[3]
>>>>     DCL OUT[4].x, CLIPDIST[4]
>>>>     DCL OUT[5].x, CLIPDIST[5]
>>>>
>>>> If indirect addressing is allowed as I read bore, then maybe the later
>>>> is better.
>>>
>>> As far as I'm aware, all hardware represents it as the former, and we
>>> have a lowering pass to fix-up the float[] accesses to be vec4[]
>>> accesses.
>>
>> GeForce8+ = scalar architecture, no vectors, addresses are byte based,
>> can access individual components just fine.
>>
>> Something like:
>>
>> gl_ClipDistance[i - 12] = some_value;
>>
>> DCL OUT[0].xyzw, POSITION
>> DCL OUT[1-8].x, CLIPDIST[0-7]
>>
>> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
>>          *              **
>>
>> *   - tgsi_dimension.Index specifying the base address by referencing a
>> declaration
>> **  - tgsi_src_register.Index
>>
>> is the only way I see to make this work nicely on all hardware.
>>
>> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned to
>> contiguous hardware resources because of semantic.)
>>
>> For constrained hardware the driver can build the clunky
>>
>> c := ADDR[0].x % 4
>> i := ADDR[0].x / 4
>> IF [c == 0]
>>    MOV OUT[i].x, TEMP[0].xxxx
>> ELSE
>> IF [c == 1]
>>    MOV OUT[i].y, TEMP[0].xxxx
>> ELSE
>> IF [c == 2]
>>    MOV OUT[i].z, TEMP[0].xxxx
>> ELSE
>>    MOV OUT[i].w, TEMP[0].xxxx
>> ENDIF
>>
>> itself.
>
> Doing it at that low-level has a number of significant drawbacks.  The
> worst is that it's long after any high-level optimizations can be done
> on the code.  It also means that it has to be reimplemented in every
> driver that needs.  This really belongs at a higher level in the code.
>
> Note that lowering pass that already exists changes the accesses to
> 'float gl_ClipDistance[8]' to 'vec4 gl_ClipDistanceMESA[2]'.  Is there
> a compelling reason to not do the same at the lower level?

Of course, we can add a CAP/option to let the driver choose whether it
wants a TGSI array or some pass at a higher level to lower the assignment.

I'd just like TGSI to be extended to be able to express what's
*actually* going on so I can produce my simple:

shl $r0, constbuf0[0], 2
store out[$r0+0x270], $r1 (0x270-0x28c are fixed locations for clip
distances)

for gl_ClipDistance[uniform int i] = some_value;