[Mesa-dev] size of LP_MAX_VBUF_SIZE
jfonseca at vmware.com
Fri Feb 21 10:37:49 UTC 2020
I tried to track down that define in git history, but there weren't many clues. That define seems to go all the way back to softpipe (see `#define SP_MAX_VBUF_SIZE 4096` on src/gallium/drivers/softpipe/sp_prim_vbuf.c added on https://gitlab.freedesktop.org/mesa/mesa/commit/2d37e78e636e5e1e7d5d00230e50a00f7a71e868 . At that time before softpipe had 64K, i915 had 16K and then softpipe got 4K, which llvmpipe eventually inherited.
I believe it's as Roland said -- probably smaller sizes were empirically shown to be better. I recall that a lot of effort had went on making SW TNL fast around that time, for Intel Poulsbo IIRC (because the GPU was so under-powered that using the CPU for VS allowed for everything to run faster!) The issue was not just the final vertex buffer size, but also forcing a small batch of vertices through the draw pipeline, which was quite deep, at least then, where every stage of the pipeline would sweep over all the vertices, and most of the stages were hand written C.
I think that nowadays, at least with llvmpipe, the draw pipeline is less deep, as a lot of what were discrete pipeline stages (e.g, clipping and final emission) are actually compiled in into the JITted VS. So I'd imagine llvmpipe (or anything that uses draw w/ LLVM) should now be able to sustain much larger batches without trashing the caches.
IIRC, mesademos' ipers/engine/fire (don't recall exactly which) were the sort of thing people used to fine tune this sort of thing. There are probably better benchmarks nowadays though.
From: Roland Scheidegger <sroland at vmware.com>
Sent: Thursday, February 20, 2020 17:27
To: Dave Airlie <airlied at gmail.com>; mesa-dev <mesa-dev at lists.freedesktop.org>
Cc: Jose Fonseca <jfonseca at vmware.com>
Subject: Re: size of LP_MAX_VBUF_SIZE
Am 20.02.20 um 02:45 schrieb Dave Airlie:
> Anyone know why LP_MAX_VBUF_SIZE is only 4K?
> Tess submits > 100K verts to the draw pipeline, which start to split
> them, due to the 4K size of the above it splits 50 vertices per vbuf,
> however it then calls draw_reset_vertex_ids which iterates over all
> 100K vertices each time.
> I might try fixing the reset, but I wondered why this was only sending
> 50 vertices at a time to the rasterizer.
I don't recall, I think this even predates me working on llvmpipe...
That said, I think in general splitting into smaller chunks is done so
things are more cache friendly (though the limit is so low it would fit
into l1 cache even back then...). And probably the overhead of invoking
things multiple times just wasn't all that large compared to the
execution time of the vs (and the setup code in llvmpipe).
I don't know if that was actually measured though at some point, and it
is quite possible the average vertex size got quite a bit larger since
then (hence max vertices per split lower), as everything was geared
towards quite simple apps back then.
So I think increasing the limit is probably quite fine, but splitting
still needs to work correctly.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mesa-dev