[Mesa-dev] [PATCH 0/2 v2] Add support for clip distances in Gallium

Thu Dec 15 12:10:52 PST 2011

----- Original Message -----
> 
> 
> ----- Original Message -----
> > On 15.12.2011 20:09, Jose Fonseca wrote:
> > > ----- Original Message -----
> > >> On 12/14/2011 12:58 AM, Ian Romanick wrote:
> > >>> On 12/13/2011 01:25 PM, Jose Fonseca wrote:
> > >>>>
> > >>>> ----- Original Message -----
> > >>>>> On 12/13/2011 03:09 PM, Jose Fonseca wrote:
> > >>>>>> ----- Original Message -----
> > >>>>>>> On 12/13/2011 12:26 PM, Bryan Cain wrote:
> > >>>>>>>> On 12/13/2011 02:11 PM, Jose Fonseca wrote:
> > >>>>>>>>> ----- Original Message -----
> > >>>>>>>>>> This is an updated version of the patch set I sent to
> > >>>>>>>>>> the
> > >>>>>>>>>> list
> > >>>>>>>>>> a
> > >>>>>>>>>> few
> > >>>>>>>>>> hours
> > >>>>>>>>>> ago.
> > >>>>>>>>>> There is now a TGSI property called
> > >>>>>>>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES
> > >>>>>>>>>> that drivers can use to determine how many of the 8
> > >>>>>>>>>> available
> > >>>>>>>>>> clip
> > >>>>>>>>>> distances
> > >>>>>>>>>> are actually used by a shader.
> > >>>>>>>>> Can't the info in TGSI_PROPERTY_NUM_CLIP_DISTANCES be
> > >>>>>>>>> easily
> > >>>>>>>>> derived from the shader, and queried through
> > >>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_scan.h ?
> > >>>>>>>> No.  The clip distances can be indirectly addressed (there
> > >>>>>>>> are
> > >>>>>>>> up
> > >>>>>>>> to 2
> > >>>>>>>> of them in vec4 form for a total of 8 floats), which makes
> > >>>>>>>> it
> > >>>>>>>> impossible
> > >>>>>>>> to determine which ones are used by analyzing the shader.
> > >>>>>>> The description is almost complete. :)  The issue is that
> > >>>>>>> the
> > >>>>>>> shader
> > >>>>>>> may
> > >>>>>>> declare
> > >>>>>>>
> > >>>>>>> out float gl_ClipDistance[4];
> > >>>>>>>
> > >>>>>>> the use non-constant addressing of the array.  The compiler
> > >>>>>>> knows
> > >>>>>>> that
> > >>>>>>> gl_ClipDistance has at most 4 elements, but post-hoc
> > >>>>>>> analysis
> > >>>>>>> would
> > >>>>>>> not
> > >>>>>>> be able to determine that.  Often the fixed-function
> > >>>>>>> hardware
> > >>>>>>> (see
> > >>>>>>> below) needs to know which clip distance values are
> > >>>>>>> actually
> > >>>>>>> written.
> > >>>>>> But don't all the clip distances written by the shader need
> > >>>>>> to
> > >>>>>> be
> > >>>>>> declared?
> > >>>>>>
> > >>>>>> E.g.:
> > >>>>>>
> > >>>>>> DCL OUT[0], CLIPDIST[0]
> > >>>>>> DCL OUT[1], CLIPDIST[1]
> > >>>>>> DCL OUT[2], CLIPDIST[2]
> > >>>>>> DCL OUT[3], CLIPDIST[3]
> > >>>>>>
> > >>>>>> therefore a trivial analysis of the declarations convey
> > >>>>>> that?
> > >>>>> No.  Clip distance is an array of up to 8 floats in GLSL, but
> > >>>>> it's
> > >>>>> represented in the hardware as 2 vec4s.  You can tell by
> > >>>>> analyzing
> > >>>>> the
> > >>>>> declarations whether there are more than 4 clip distances in
> > >>>>> use,
> > >>>>> but
> > >>>>> not which components the shader writes to.
> > >>>>> TGSI_PROPERTY_NUM_CLIP_DISTANCES is the number of components
> > >>>>> in
> > >>>>> use,
> > >>>>> not
> > >>>>> the number of full vectors.
> > >>>> Lets imagine
> > >>>>
> > >>>>    out float gl_ClipDistance[6];
> > >>>>
> > >>>> Each a clip distance is a scalar float.
> > >>>>
> > >>>> Either all hardware represents the 8 clip distances as two 4
> > >>>> vectors,
> > >>>> and we do:
> > >>>>
> > >>>>    DCL OUT[0].xywz, CLIPDIST[0]
> > >>>>    DCL OUT[1].xy, CLIPDIST[1]
> > >>>>
> > >>>> using the full range of struct tgsi_declaration::UsageMask [1]
> > >>>> or
> > >>>> we
> > >>>> represent them as as scalars:
> > >>>>
> > >>>>    DCL OUT[0].x, CLIPDIST[0]
> > >>>>    DCL OUT[1].x, CLIPDIST[1]
> > >>>>    DCL OUT[2].x, CLIPDIST[2]
> > >>>>    DCL OUT[3].x, CLIPDIST[3]
> > >>>>    DCL OUT[4].x, CLIPDIST[4]
> > >>>>    DCL OUT[5].x, CLIPDIST[5]
> > >>>>
> > >>>> If indirect addressing is allowed as I read bore, then maybe
> > >>>> the
> > >>>> later
> > >>>> is better.
> > >>> As far as I'm aware, all hardware represents it as the former,
> > >>> and
> > >>> we
> > >>> have a lowering pass to fix-up the float[] accesses to be
> > >>> vec4[]
> > >>> accesses.
> > >> GeForce8+ = scalar architecture, no vectors, addresses are byte
> > >> based,
> > >> can access individual components just fine.
> > > Ok. So we should avoid baking this vec4 assumption in TGSI
> > > semantics.
> > >
> > >> Something like:
> > >>
> > >> gl_ClipDistance[i - 12] = some_value;
> > >>
> > >> DCL OUT[0].xyzw, POSITION
> > >> DCL OUT[1-8].x, CLIPDIST[0-7]
> > >>
> > >> MOV OUT<1>[ADDR[0].x - 12].x, TEMP[0].xxxx
> > >>         *              **
> > >>
> > >> *   - tgsi_dimension.Index specifying the base address by
> > >> referencing
> > >> a
> > >> declaration
> > >> **  - tgsi_src_register.Index
> > >>
> > >> is the only way I see to make this work nicely on all hardware.
> > >> (This is also needed if OUT[i] and OUT[i + 1] cannot be assigned
> > >> to
> > >> contiguous hardware resources because of semantic.)
> > > I think that having indexable temps, like D3D10, would be a
> > > cleaner:
> > 
> > The problem is that we need an indexable version of every file then
> > (at
> > least INPUT, OUTPUT), and then all the nice 32 bit structs break
> > down
> > when we get more than 16 files.
> > 
> > D3D doesn't have these because indirect IN/OUT isn't allowed there,
> > but
> > it is in GL and the hardware can do it.
> 
> Indirect IN/OUT is allowed on D3D9 ,
> http://msdn.microsoft.com/en-us/library/windows/desktop/bb172963%28v=vs.85%29.aspx
> , but it looks like SM4 indeed doens't allow,
>  http://msdn.microsoft.com/en-us/library/windows/desktop/ff471378%28v=VS.85%29.aspx
> , which means that indirect input needs spilling the inputs into a
> indexable temporary.
> 
> > Also, having an indexable version of every file seems odd,
> > especially
> > since we need a way to distinguish individual arrays inside that
> > file
> > anyway (just SM4 uses 2 indices to access INDEXABLE_TEMP; for INPUT
> > we'll need 3 indices).
> 
> Fair enough.
> 
> > >   DCL OUT[0].xyzw, POSITION
> > >   DCL OUT[1][0-7].x, CLIPDIST[0-7]
> > >
> > >   MOV OUT[1][ADDR[0].x - 12].x, TEMP[0].xxxx
> > >
> > > I propose we first add this new kind of temp at a first stage,
> > > then
> > > prohibit indirect addressing of all but this kind of temps.
> > 
> > There's already TEMPORARY_ARRAY, but no one wants to use it because
> > it's
> > not clear how to distinguish individual arrays ...
> 
> There's a reference implementation in
> src/gallium/auxiliary/tgsi/tgsi_exec.c but I can't find any use of
> it outside.

The commit that introduced TEMPORARY_ARRAY has bit more detail:

commit 101f792a2af9c9a19a050afba8b60caa689466a5
Author: Zack Rusin <zackr at vmware.com>
Date:   Fri Jun 18 13:41:20 2010 -0400

    gallium: add a temporary array register file

    like normal temporaries, but allows to define a number of distinct
    arrays, all of which make it explicit that they contain /indexable/
    registers.
    as a side-effect we're adding support for multi-dimensional destination
    registers.
    The whole thing looks like this:
    DCL TEMPX[0][0..128]  # 0 array with 128 registers

    ADD TEMPX[0][0], IN[0], IMM[0]
    ADD TEMPX[0][1], IN[0], IMM[0]
    ABS OUT[0], TEMPX[0][TEMP[0]]

But if we'd follow this route, it's still not clear what to do for INPUT/OUTPUT/CNSTS, besides adding INPUTX,OUTPUTX,CONSTX?

Another approach would be to make all registers bi-dimendional (as you were hinting), but make the second index optional for single element arrays.

Jose