[Mesa-dev] draw: Replace varray and vcache by vsplit

Fri Aug 13 08:09:22 PDT 2010

On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell <keithw at vmware.com> wrote:
> On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell <keithw at vmware.com> wrote:
>> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>> >> Hi,
>> >>
>> >> There are two primitive transformations in gallium draw module.  In
>> >> varray, primitives are "split"ted.  When a primitive has more vertices
>> >> than the middle end can handle, varray splits the primitive and calls
>> >> the middle end multiple times.
>> >>
>> >> In vcache, primitives are "decompose"d.  More advanced primitives are
>> >> decomposed into one of point, line(_adj), or triangle(_adj).
>> >> Similarly, vcache may call the middle end multiple times to flush its
>> >> internal buffer.  In some cases, vcache passes the primitves through
>> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>> >>
>> >> The issue with vcache is that it has to decompose a primitive
>> >> differently depending on the provoking convention, as explained in
>> >>
>> >>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>> >>
>> >> It becomes a problem when GS is active.
>> >>
>> >> My proposal is to make vcache split instead of decompose.  Because
>> >> varray only splits and vcache has a pass-through path, the rest of the
>> >> workflow already has to support all primitive types.  Switching from
>> >> decompose to split does not require a big change to the rest of the
>> >> workflow.
>> >>
>> >> But then vcache will look a lot like varray, only with indexed
>> >> primitive support.  It leads me to a new frontend that replaces both
>> >> varray and vcache: vsplit
>> >>
>> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>> >>
>> >> vsplit is based on varray.  It uses some code from vcache to support
>> >> indexed primitives.  When vcache decomposes, there are flags being set
>> >> to indicate that if the stipple counter should be reset or if some
>> >> edge of a triangle should be omitted in unfilled mode.  The segments
>> >> of a splitted primitive have flags for similar purposes too:
>> >>
>> >>   DRAW_SPLIT_AFTER   More segments to come after this one
>> >>   DRAW_SPLIT_BEFORE  There are preceding segments
>> >>
>> >> These flags are set by vsplit and the middle ends pass them to the
>> >> other stages.  Therefore, the run methods of middle ends are augmented
>> >> to take the flags.
>> >>
>> >> To summarize, vsplit
>> >>
>> >>  - fixes GS when (flatshade && flatshade_first) is on
>> >>  - never sends more vertices than the middle end claims to handle
>> >>  - is faster than vcache: split instead of decompose, no get_elt
>> >>    calls
>> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>> >>
>> >> Suggestions?
>> >
>> >
>> > Hi - I haven't looked at the patches yet, but a couple of questions:
>> >
>> > How does this interact with the draw_pipe_* code - which requires
>> > decomposed primitives?
>> draw_pipe.c decomposes the primitives.  It is there before because it
>> has to support varray and vcache_check_run which do not decompose.
>
> OK.
>
>> > How does this cope with indexed rendering where the vertex buffers
>> > themselves are too large (for hardware or some other entity)?  Eg.
>> > imagine the hardware could cope with up to 64k vertices, and you have a
>> > drawelements call randomly referencing vertices in range 0..128k ?
>> Vertex fetching happens in the middle end so the range of the indices
>> is not a problem.  Though vsplit guarantees that it never calls the
>> middle end with more vertices than the middle end claims to support
>> (as returned by draw_pt_middle_end::prepare).  The limit is usually
>> decidied by the size of the buffer for vertex emitting.
>
> I guess I'm wondering how it does this.  If the middle end says it
> supports 64k vertices, and the vertex element looks like
>
>  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>
> what gets sent?  (Sorry, I still haven't looked at the code, you could
> well have addressed this).
I see.  The frontend would set

   fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
   draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]

fetch_elts is processed by the middle end and it will fetch the given
vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
is the new index buffer, which indexes into the fetched vertices.

It is actual the same as vcache.  So when fetch_elts is

   [0, 128k, 64k, 64k, 128k, 16k, ...],

draw_elts would be set to

   [0, 1, 2, 2, 1, 3, ...]

The number of elements to fetch (and shade) is minimized.

-- 
olv at LunarG.com