[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Thu Dec 15 11:37:16 PST 2011

----- Original Message -----
> On 12/13/2011 04:22 PM, Jose Fonseca wrote:
> > ----- Original Message -----
> >>
> >> ----- Original Message -----
> >>> On 12/13/2011 03:48 PM, Jose Fonseca wrote:
> >>>> ----- Original Message -----
> >>>>> On 12/13/2011 03:25 PM, Jose Fonseca wrote:
> >>>>>> ----- Original Message -----
> >>>>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
> >>>>>>>> ----- Original Message -----
> >>>>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
> >>>>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
> >>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>> This is an updated version of the patch set I sent to
> >>>>>>>>>>>> the
> >>>>>>>>>>>> list
> >>>>>>>>>>>> a
> >>>>>>>>>>>> few
> >>>>>>>>>>>> hours
> >>>>>>>>>>>> ago.
> >>>>>>>>>>>> There is now a TGSI property called
> >>>>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
> >>>>>>>>>>>> that drivers can use to determine how many of the 8
> >>>>>>>>>>>> available
> >>>>>>>>>>>> clip
> >>>>>>>>>>>> distances
> >>>>>>>>>>>> are actually used by a shader.
> >>>>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be
> >>>>>>>>>>> easily
> >>>>>>>>>>> derived from the shader, and queried through
> >>>>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
> >>>>>>>>>> No.  The clip distances can be indirectly addressed (there
> >>>>>>>>>> are
> >>>>>>>>>> up
> >>>>>>>>>> to 2
> >>>>>>>>>> of them in vec4 form for a total of 8 floats), which makes
> >>>>>>>>>> it
> >>>>>>>>>> impossible
> >>>>>>>>>> to determine which ones are used by analyzing the shader.
> >>>>>>>>> The description is almost complete. :)  The issue is that
> >>>>>>>>> the
> >>>>>>>>> shader
> >>>>>>>>> may
> >>>>>>>>> declare
> >>>>>>>>>
> >>>>>>>>> out float gl_ClipDistance[4];
> >>>>>>>>>
> >>>>>>>>> the use non-constant addressing of the array.  The compiler
> >>>>>>>>> knows
> >>>>>>>>> that
> >>>>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc
> >>>>>>>>> analysis
> >>>>>>>>> would
> >>>>>>>>> not
> >>>>>>>>> be able to determine that.  Often the fixed-function
> >>>>>>>>> hardware
> >>>>>>>>> (see
> >>>>>>>>> below) needs to know which clip distance values are
> >>>>>>>>> actually
> >>>>>>>>> written.
> >>>>>>>> But don't all the clip distances written by the shader need
> >>>>>>>> to
> >>>>>>>> be
> >>>>>>>> declared?
> >>>>>>>>
> >>>>>>>> E.g.:
> >>>>>>>>
> >>>>>>>> DCL OUT[0], CLIPDIST[0]
> >>>>>>>> DCL OUT[1], CLIPDIST[1]
> >>>>>>>> DCL OUT[2], CLIPDIST[2]
> >>>>>>>> DCL OUT[3], CLIPDIST[3]
> >>>>>>>>
> >>>>>>>> therefore a trivial analysis of the declarations convey
> >>>>>>>> that?
> >>>>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but
> >>>>>>> it's
> >>>>>>> represented in the hardware as 2 vec4s.  You can tell by
> >>>>>>> analyzing
> >>>>>>> the
> >>>>>>> declarations whether there are more than 4 clip distances in
> >>>>>>> use,
> >>>>>>> but
> >>>>>>> not which components the shader writes to.
> >>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components
> >>>>>>> in
> >>>>>>> use,
> >>>>>>> not
> >>>>>>> the number of full vectors.
> >>>>>> Lets imagine
> >>>>>>
> >>>>>>   out float gl_ClipDistance[6];
> >>>>>>
> >>>>>> Each a clip distance is a scalar float.
> >>>>>>
> >>>>>> Either all hardware represents the 8 clip distances as two 4
> >>>>>> vectors, and we do:
> >>>>>>
> >>>>>>   DCL OUT[0].xywz, CLIPDIST[0]
> >>>>>>   DCL OUT[1].xy, CLIPDIST[1]
> >>>>>>
> >>>>>> using the full range of struct tgsi_declaration::UsageMask [1]
> >>>>>> or
> >>>>>> we represent them as as scalars:
> >>>>>>
> >>>>>>   DCL OUT[0].x, CLIPDIST[0]
> >>>>>>   DCL OUT[1].x, CLIPDIST[1]
> >>>>>>   DCL OUT[2].x, CLIPDIST[2]
> >>>>>>   DCL OUT[3].x, CLIPDIST[3]
> >>>>>>   DCL OUT[4].x, CLIPDIST[4]
> >>>>>>   DCL OUT[5].x, CLIPDIST[5]
> >>>>>>
> >>>>>> If indirect addressing is allowed as I read bore, then maybe
> >>>>>> the
> >>>>>> later is better.
> >>>>>>
> >>>>>> I confess my ignorance about clipping and maybe I'm being
> >>>>>> dense,
> >>>>>> but I still don't see the need for this
> >>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES.  Could you please draft an
> >>>>>> example TGSI shader showing this property (or just paste one
> >>>>>> generated with your change)?  I think that would help a lot.
> >>>>>>
> >>>>>>
> >>>>>> Jose
> >>>>>>
> >>>>>>
> >>>>>> [1] I don't know if tgsi_dump pays much attention to
> >>>>>>  tgsi_declaration::UsageMask, but it does exist.
> >>>>> UsageMask might work, but before that can be considered a
> >>>>> viable
> >>>>> solution, someone will need to make it possible to actually
> >>>>> declare
> >>>>> it
> >>>>> from ureg.  As it is, ureg is hardcoded to set UsageMask to
> >>>>> xyzw
> >>>>> no
> >>>>> matter what on all declared inputs and outputs.
> >>>> ureg automatically fills the UsageMask from the destionation
> >>>> register masks, since it easy to determine from the opcodes.
> >>> Wait, where does it do that?  When I search through tgsi_ureg.c
> >>> for
> >>> "UsageMask", all it shows are assignments of TGSI_WRITEMASK_XYZW
> >>> to
> >>> the
> >>> UsageMask property.
> >> ah. I may be lying. But I'm pretty sure I wrote such code
> >> somewhere,
> >> sometime. Let me dig it.
> > I was lying.  I wrote tgsi_util_get_inst_usage_mask() in
> > src/gallium/auxiliary/tgsi/tgsi_util.c , but it only analyses
> > which registers are _read_, and never got hooked into ureg anyway.
> >
> > I don't want you to go over hoops just to pass a scalar quantity.
> > So may be just add ability to ureg to specify declaration's output
> > mask?
> 
> One problem with this is that the output mask would have to be a
> parameter to a new declaration function in ureg, like
> ureg_DECL_output_with_mask() instead of just ureg_DECL_output().  In
> this case, ureg_DECL_output_with_mask() would be the only DECL
> function
> with a usage mask specified.  If that asymmetry is okay with you, I
> think I could go the UsageMask route.

It looks from Christoph Bullimer's reply that it might be better to describe  CLIPDIST as an array after all.

But FWIW ureg_DECL_output_with_mask() makes sense on its own right, as I really think we should start filling these masks, instead of assuming that everything takes 4 components.

Ureg to fill them automatically from the destination register writemask in many cases, but it would have to be over conservative for indirect registers.

Jose