[Mesa-dev] draw: Replace varray and vcache by vsplit
keith whitwell
keith.whitwell at gmail.com
Sat Aug 14 15:46:19 PDT 2010
On Fri, Aug 13, 2010 at 5:25 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
> On Fri, Aug 13, 2010 at 11:35 PM, Keith Whitwell <keithw at vmware.com> wrote:
>> On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote:
>>> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell <keithw at vmware.com> wrote:
>>> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>>> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell <keithw at vmware.com> wrote:
>>> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>>> >> >> Hi,
>>> >> >>
>>> >> >> There are two primitive transformations in gallium draw module. In
>>> >> >> varray, primitives are "split"ted. When a primitive has more vertices
>>> >> >> than the middle end can handle, varray splits the primitive and calls
>>> >> >> the middle end multiple times.
>>> >> >>
>>> >> >> In vcache, primitives are "decompose"d. More advanced primitives are
>>> >> >> decomposed into one of point, line(_adj), or triangle(_adj).
>>> >> >> Similarly, vcache may call the middle end multiple times to flush its
>>> >> >> internal buffer. In some cases, vcache passes the primitves through
>>> >> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>>> >> >>
>>> >> >> The issue with vcache is that it has to decompose a primitive
>>> >> >> differently depending on the provoking convention, as explained in
>>> >> >>
>>> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>>> >> >>
>>> >> >> It becomes a problem when GS is active.
>>> >> >>
>>> >> >> My proposal is to make vcache split instead of decompose. Because
>>> >> >> varray only splits and vcache has a pass-through path, the rest of the
>>> >> >> workflow already has to support all primitive types. Switching from
>>> >> >> decompose to split does not require a big change to the rest of the
>>> >> >> workflow.
>>> >> >>
>>> >> >> But then vcache will look a lot like varray, only with indexed
>>> >> >> primitive support. It leads me to a new frontend that replaces both
>>> >> >> varray and vcache: vsplit
>>> >> >>
>>> >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>>> >> >>
>>> >> >> vsplit is based on varray. It uses some code from vcache to support
>>> >> >> indexed primitives. When vcache decomposes, there are flags being set
>>> >> >> to indicate that if the stipple counter should be reset or if some
>>> >> >> edge of a triangle should be omitted in unfilled mode. The segments
>>> >> >> of a splitted primitive have flags for similar purposes too:
>>> >> >>
>>> >> >> DRAW_SPLIT_AFTER More segments to come after this one
>>> >> >> DRAW_SPLIT_BEFORE There are preceding segments
>>> >> >>
>>> >> >> These flags are set by vsplit and the middle ends pass them to the
>>> >> >> other stages. Therefore, the run methods of middle ends are augmented
>>> >> >> to take the flags.
>>> >> >>
>>> >> >> To summarize, vsplit
>>> >> >>
>>> >> >> - fixes GS when (flatshade && flatshade_first) is on
>>> >> >> - never sends more vertices than the middle end claims to handle
>>> >> >> - is faster than vcache: split instead of decompose, no get_elt
>>> >> >> calls
>>> >> >> - no longer uses the higher bits of draw_elts for stipple/edge flags
>>> >> >>
>>> >> >> Suggestions?
>>> >> >
>>> >> >
>>> >> > Hi - I haven't looked at the patches yet, but a couple of questions:
>>> >> >
>>> >> > How does this interact with the draw_pipe_* code - which requires
>>> >> > decomposed primitives?
>>> >> draw_pipe.c decomposes the primitives. It is there before because it
>>> >> has to support varray and vcache_check_run which do not decompose.
>>> >
>>> > OK.
>>> >
>>> >> > How does this cope with indexed rendering where the vertex buffers
>>> >> > themselves are too large (for hardware or some other entity)? Eg.
>>> >> > imagine the hardware could cope with up to 64k vertices, and you have a
>>> >> > drawelements call randomly referencing vertices in range 0..128k ?
>>> >> Vertex fetching happens in the middle end so the range of the indices
>>> >> is not a problem. Though vsplit guarantees that it never calls the
>>> >> middle end with more vertices than the middle end claims to support
>>> >> (as returned by draw_pt_middle_end::prepare). The limit is usually
>>> >> decidied by the size of the buffer for vertex emitting.
>>> >
>>> > I guess I'm wondering how it does this. If the middle end says it
>>> > supports 64k vertices, and the vertex element looks like
>>> >
>>> > [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>>> >
>>> > what gets sent? (Sorry, I still haven't looked at the code, you could
>>> > well have addressed this).
>>> I see. The frontend would set
>>>
>>> fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>>> draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]
>>>
>>> fetch_elts is processed by the middle end and it will fetch the given
>>> vertices. draw_elts will be passed to draw_emit or the pipeline. It
>>> is the new index buffer, which indexes into the fetched vertices.
>>>
>>> It is actual the same as vcache. So when fetch_elts is
>>>
>>> [0, 128k, 64k, 64k, 128k, 16k, ...],
>>>
>>> draw_elts would be set to
>>>
>>> [0, 1, 2, 2, 1, 3, ...]
>>>
>>> The number of elements to fetch (and shade) is minimized.
>>
>> Thanks Chia-I, I've taken a look at the code & this makes sense - the
>> fetch/draw cache is still there, but specialized into 4 versions for
>> each element type. And it seems like you take some steps not to hit it
>> unnecessarily.
>>
>> I'm coming up to speed on it though, so a couple more questions - for
>> fan primitives, it seems like you always end up in the segment_cache
>> code -- is that true, or is there a fastpath I missed? In particular,
>> if the whole fan fits within the limits of the middle end, will it still
>> end up going through the cache?
> Yes, if it exceeds vsplit's limit (SEGMENT_SIZE).
>> Actually it looks like this happens in an early-out at the bottom of the
>> patch:
>>
>>
>> + /* no splitting required */
>> + if (count <= max_count_simple) {
>> + SEGMENT_SIMPLE(0x0, start, count);
>> + }
>>
>>
>> where max_count_simple is either
>>
>> vsplit->max_vertices
>> or
>> vsplit->segment_size (for indexed primitives)
>>
>> These in turn are generated as:
>>
>> + middle->prepare(middle, vsplit->prim, opt, &vsplit->max_vertices);
>> +
>> + vsplit->segment_size = MIN2(SEGMENT_SIZE, vsplit->max_vertices);
>>
>> and SEGMENT_SIZE is 1024.
>>
>>
>> So any indexed primitive where the number of vertices (or is it number
>> of indices) exceeds 1024, will end up on the cache path?
>> I know this used to be true as well -- just wondering if there is a way
>> to improve on this...
> max_count_simple is set to the segment size (<= 1024) because the
> middle end expects draw_elts to be of type ushort. vsplit needs to
> use its internal fixed-size buffer when the index_size!=2.
>
> The limit may be lifted for index_size==2. The attached patch should
> relax the limit (untested as it is getting late here :-). Another way
> that comes to my mind now is to make the internal buffer dynamically
> sized, and make SEGMENT_SIZE a large limit on the dynamic size.
>
I think this all makes a great followon change, but as a first step
vsplit looks very nice - a welcome cleanup of the existing code.
Keith
More information about the mesa-dev
mailing list