[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Thu Dec 15 11:09:53 PST 2011

----- Original Message -----
> On 12/14/2011 12:58 AM, Ian Romanick wrote:
> > On 12/13/2011 01:25 PM, Jose Fonseca wrote:
> >>
> >>
> >> ----- Original Message -----
> >>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
> >>>>
> >>>> ----- Original Message -----
> >>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
> >>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
> >>>>>>> ----- Original Message -----
> >>>>>>>> This is an updated version of the patch set I sent to the
> >>>>>>>> list
> >>>>>>>> a
> >>>>>>>> few
> >>>>>>>> hours
> >>>>>>>> ago.
> >>>>>>>> There is now a TGSI property called
> >>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
> >>>>>>>> that drivers can use to determine how many of the 8
> >>>>>>>> available
> >>>>>>>> clip
> >>>>>>>> distances
> >>>>>>>> are actually used by a shader.
> >>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be easily
> >>>>>>> derived from the shader, and queried through
> >>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
> >>>>>> No.  The clip distances can be indirectly addressed (there are
> >>>>>> up
> >>>>>> to 2
> >>>>>> of them in vec4 form for a total of 8 floats), which makes it
> >>>>>> impossible
> >>>>>> to determine which ones are used by analyzing the shader.
> >>>>> The description is almost complete. :)  The issue is that the
> >>>>> shader
> >>>>> may
> >>>>> declare
> >>>>>
> >>>>> out float gl_ClipDistance[4];
> >>>>>
> >>>>> the use non-constant addressing of the array.  The compiler
> >>>>> knows
> >>>>> that
> >>>>> gl_ClipDistance has at most 4 elements, but post-hoc analysis
> >>>>> would
> >>>>> not
> >>>>> be able to determine that.  Often the fixed-function hardware
> >>>>> (see
> >>>>> below) needs to know which clip distance values are actually
> >>>>> written.
> >>>> But don't all the clip distances written by the shader need to
> >>>> be
> >>>> declared?
> >>>>
> >>>> E.g.:
> >>>>
> >>>> DCL OUT[0], CLIPDIST[0]
> >>>> DCL OUT[1], CLIPDIST[1]
> >>>> DCL OUT[2], CLIPDIST[2]
> >>>> DCL OUT[3], CLIPDIST[3]
> >>>>
> >>>> therefore a trivial analysis of the declarations convey that?
> >>>
> >>> No.  Clip distance is an array of up to 8 floats in GLSL, but
> >>> it's
> >>> represented in the hardware as 2 vec4s.  You can tell by
> >>> analyzing
> >>> the
> >>> declarations whether there are more than 4 clip distances in use,
> >>> but
> >>> not which components the shader writes to.
> >>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components in
> >>> use,
> >>> not
> >>> the number of full vectors.
> >>
> >> Lets imagine
> >>
> >>    out float gl_ClipDistance[6];
> >>
> >> Each a clip distance is a scalar float.
> >>
> >> Either all hardware represents the 8 clip distances as two 4
> >> vectors,
> >> and we do:
> >>
> >>    DCL OUT[0].xywz, CLIPDIST[0]
> >>    DCL OUT[1].xy, CLIPDIST[1]
> >>
> >> using the full range of struct tgsi_declaration::UsageMask [1] or
> >> we
> >> represent them as as scalars:
> >>
> >>    DCL OUT[0].x, CLIPDIST[0]
> >>    DCL OUT[1].x, CLIPDIST[1]
> >>    DCL OUT[2].x, CLIPDIST[2]
> >>    DCL OUT[3].x, CLIPDIST[3]
> >>    DCL OUT[4].x, CLIPDIST[4]
> >>    DCL OUT[5].x, CLIPDIST[5]
> >>
> >> If indirect addressing is allowed as I read bore, then maybe the
> >> later
> >> is better.
> > 
> > As far as I'm aware, all hardware represents it as the former, and
> > we
> > have a lowering pass to fix-up the float[] accesses to be vec4[]
> > accesses.
> 
> GeForce8+ = scalar architecture, no vectors, addresses are byte
> based,
> can access individual components just fine.

Ok. So we should avoid baking this vec4 assumption in TGSI semantics.

> Something like:
> 
> gl_ClipDistance[i - 12] = some_value;
> 
> DCL OUT[0].xyzw, POSITION
> DCL OUT[1-8].x, CLIPDIST[0-7]
> 
> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
>         *              **
> 
> *   - tgsi_dimension.Index specifying the base address by referencing
> a
> declaration
> **  - tgsi_src_register.Index
> 
> is the only way I see to make this work nicely on all hardware. 
> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned to
> contiguous hardware resources because of semantic.)

I think that having indexable temps, like D3D10, would be a cleaner:

  DCL OUT[0].xyzw, POSITION
  DCL OUT[1][0-7].x, CLIPDIST[0-7]

  MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx

I propose we first add this new kind of temp at a first stage, then prohibit indirect addressing of all but this kind of temps.

> For constrained hardware the driver can build the clunky
> 
> c := ADDR[0].x % 4
> i := ADDR[0].x / 4
> IF [c == 0]
>   MOV OUT[i].x, TEMP[0].xxxx
> ELSE
> IF [c == 1]
>   MOV OUT[i].y, TEMP[0].xxxx
> ELSE
> IF [c == 2]
>   MOV OUT[i].z, TEMP[0].xxxx
> ELSE
>   MOV OUT[i].w, TEMP[0].xxxx
> ENDIF
> 
> itself.

Sounds good plan to me.

BTW, I took a look at inputs/outputs UsageMasks and although we don't use them, I really think we really should, as having that info readily accessible would allow to avoid wasting time/bandwidth copying attributes which are not needed.

Jose