[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Thu Dec 15 11:25:14 PST 2011

----- Original Message -----
> On 12/13/2011 02:12 PM, Jose Fonseca wrote:
> >
> >
> > ----- Original Message -----
> >> On 12/13/2011 03:48 PM, Jose Fonseca wrote:
> >>> ----- Original Message -----
> >>>> On 12/13/2011 03:25 PM, Jose Fonseca wrote:
> >>>>> ----- Original Message -----
> >>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
> >>>>>>> ----- Original Message -----
> >>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
> >>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
> >>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>> This is an updated version of the patch set I sent to the
> >>>>>>>>>>> list
> >>>>>>>>>>> a
> >>>>>>>>>>> few
> >>>>>>>>>>> hours
> >>>>>>>>>>> ago.
> >>>>>>>>>>> There is now a TGSI property called
> >>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
> >>>>>>>>>>> that drivers can use to determine how many of the 8
> >>>>>>>>>>> available
> >>>>>>>>>>> clip
> >>>>>>>>>>> distances
> >>>>>>>>>>> are actually used by a shader.
> >>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be
> >>>>>>>>>> easily
> >>>>>>>>>> derived from the shader, and queried through
> >>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
> >>>>>>>>> No.  The clip distances can be indirectly addressed (there
> >>>>>>>>> are
> >>>>>>>>> up
> >>>>>>>>> to 2
> >>>>>>>>> of them in vec4 form for a total of 8 floats), which makes
> >>>>>>>>> it
> >>>>>>>>> impossible
> >>>>>>>>> to determine which ones are used by analyzing the shader.
> >>>>>>>> The description is almost complete. :)  The issue is that
> >>>>>>>> the
> >>>>>>>> shader
> >>>>>>>> may
> >>>>>>>> declare
> >>>>>>>>
> >>>>>>>> out float gl_ClipDistance[4];
> >>>>>>>>
> >>>>>>>> the use non-constant addressing of the array.  The compiler
> >>>>>>>> knows
> >>>>>>>> that
> >>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc
> >>>>>>>> analysis
> >>>>>>>> would
> >>>>>>>> not
> >>>>>>>> be able to determine that.  Often the fixed-function
> >>>>>>>> hardware
> >>>>>>>> (see
> >>>>>>>> below) needs to know which clip distance values are actually
> >>>>>>>> written.
> >>>>>>> But don't all the clip distances written by the shader need
> >>>>>>> to
> >>>>>>> be
> >>>>>>> declared?
> >>>>>>>
> >>>>>>> E.g.:
> >>>>>>>
> >>>>>>> DCL OUT[0], CLIPDIST[0]
> >>>>>>> DCL OUT[1], CLIPDIST[1]
> >>>>>>> DCL OUT[2], CLIPDIST[2]
> >>>>>>> DCL OUT[3], CLIPDIST[3]
> >>>>>>>
> >>>>>>> therefore a trivial analysis of the declarations convey that?
> >>>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but
> >>>>>> it's
> >>>>>> represented in the hardware as 2 vec4s.  You can tell by
> >>>>>> analyzing
> >>>>>> the
> >>>>>> declarations whether there are more than 4 clip distances in
> >>>>>> use,
> >>>>>> but
> >>>>>> not which components the shader writes to.
> >>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components
> >>>>>> in
> >>>>>> use,
> >>>>>> not
> >>>>>> the number of full vectors.
> >>>>> Lets imagine
> >>>>>
> >>>>>    out float gl_ClipDistance[6];
> >>>>>
> >>>>> Each a clip distance is a scalar float.
> >>>>>
> >>>>> Either all hardware represents the 8 clip distances as two 4
> >>>>> vectors, and we do:
> >>>>>
> >>>>>    DCL OUT[0].xywz, CLIPDIST[0]
> >>>>>    DCL OUT[1].xy, CLIPDIST[1]
> >>>>>
> >>>>> using the full range of struct tgsi_declaration::UsageMask [1]
> >>>>> or
> >>>>> we represent them as as scalars:
> >>>>>
> >>>>>    DCL OUT[0].x, CLIPDIST[0]
> >>>>>    DCL OUT[1].x, CLIPDIST[1]
> >>>>>    DCL OUT[2].x, CLIPDIST[2]
> >>>>>    DCL OUT[3].x, CLIPDIST[3]
> >>>>>    DCL OUT[4].x, CLIPDIST[4]
> >>>>>    DCL OUT[5].x, CLIPDIST[5]
> >>>>>
> >>>>> If indirect addressing is allowed as I read bore, then maybe
> >>>>> the
> >>>>> later is better.
> >>>>>
> >>>>> I confess my ignorance about clipping and maybe I'm being
> >>>>> dense,
> >>>>> but I still don't see the need for this
> >>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES.  Could you please draft an
> >>>>> example TGSI shader showing this property (or just paste one
> >>>>> generated with your change)?  I think that would help a lot.
> >>>>>
> >>>>>
> >>>>> Jose
> >>>>>
> >>>>>
> >>>>> [1] I don't know if tgsi_dump pays much attention to
> >>>>>   tgsi_declaration::UsageMask, but it does exist.
> >>>> UsageMask might work, but before that can be considered a viable
> >>>> solution, someone will need to make it possible to actually
> >>>> declare
> >>>> it
> >>>> from ureg.  As it is, ureg is hardcoded to set UsageMask to xyzw
> >>>> no
> >>>> matter what on all declared inputs and outputs.
> >>> ureg automatically fills the UsageMask from the destionation
> >>> register masks, since it easy to determine from the opcodes.
> >>>
> >>> Which leads me to my second point, if indirect addressing of
> >>> CLIPDIST is allowed, then we can't really pack the clip distance
> >>> as 4-elem vectors in TGSI: not only the syntax would be very
> >>> weird, but it would create havoc on all tgsi-translating code
> >>> that
> >>> makes decisions based on indirect addressing of registers.
> >>>
> >>> That is,
> >>>
> >>>    float gl_ClipDistance[6];
> >>>
> >>>    gl_ClipDistance[i] = foo;
> >>>
> >>> would become
> >>>
> >>>     DCL OUT[0].x, CLIPDIST[0]
> >>>     DCL OUT[1].x, CLIPDIST[1]
> >>>     DCL OUT[2].x, CLIPDIST[2]
> >>>     DCL OUT[3].x, CLIPDIST[3]
> >>>     DCL OUT[4].x, CLIPDIST[4]
> >>>     DCL OUT[5].x, CLIPDIST[5]
> >>>     MOV OUT[ADDR[0].x].x, foo
> >>>
> >>> and the info from TGSI_PROPERTY_NUM_CLIP_DISTANCES can be
> >>> obtained
> >>> by walking the declaration (which can/should be done only once in
> >>> tgsi_scan).
> >>>
> >>> But this just doesn't look like it would ever work:
> >>>
> >>>     DCL OUT[0].xyzw, CLIPDIST[0]
> >>>     DCL OUT[1].xy  , CLIPDIST[1]
> >>>     MOV OUT[ADDR[0].x].????, foo
> >>>
> >>> Jose
> >>
> >> If ureg automatically fills the UsageMask from the accessed
> >> components,
> >> it's probably a better solution than the property.
> >>
> >> About the indirect addressing of components: the GLSL compiler
> >> lowers
> >> indirect addressing of the gl_ClipDistance array to indirect
> >> addressing
> >> of the 2 vec4s, combined with conditional moves to the different
> >> components.  Which is okay, because although indirect addressing
> >> of
> >> gl_ClipDistance is allowed by the GLSL specification, meaning have
> >> to
> >> support it, it's not something that's actually useful in practical
> >> situations.
> >
> > It sounds a bit complicated -- the lowered indirect addressing code
> > will certainly result in inefficient code for software rendering
> > (e.g, draw module + llvm), but that alone is probably not reason
> > to be different.  Could everybody confirm that all hardware uses 2
> > vec4s for clip distances and is not able to natively do indirect
> > addressing?
> 
> Those are (or should be) orthogonal.  For example, i965 uses vec4[2],
> but it can indirect access components of a vector.  We don't do that
> yet, but we could.  I don't know if there's a way to represent that
> in
> TGSI, but it can be represented in the GLSL IR.  We just have two
> separate lowering passes.  One does the float[]->vec4[] conversion,
> and
> the other does the vec4 indexing conversion.

Makes a lot of sense.

In order to leverage the flexibility of cherry-picking GLSL lowering passes we would have to expose the hardware's desired layout to glsl_to_tgsi.cpp in a cap. I think it makes a lot of sense to do that for complicated optimizations, but I'm not sure if this particular case justifies it though, as the indirect->direct seems easy enough to do on the hardware that needs. Either way, I don't have a strong opinion about it.

Jose