[Mesa-dev] [PATCH] panfrost: Use tiler fast path (performance boost)
Rob Clark
robdclark at gmail.com
Thu Feb 21 14:01:36 UTC 2019
On Thu, Feb 21, 2019 at 1:19 AM Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
>
> For reasons that are still unclear (speculation included in the comment
> added in this patch), the tiler? metadata has a fast path that we were
> not enabling; there looks to be a possible time/memory tradeoff, but the
> details remain unclear.
>
> Regardless, this patch improves performance dramatically. Particular
> wins are for geometry-heavy scenes. For instance, glmark2-es2's
> Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
> from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
> when vsync is disabled, as in glmark2-es2-wayland.
>
> With this patch, on GLES 2.0 samples not involving FBOs, it appears
> performance is converging with (and sometimes surpassing) the blob.
>
> Signed-off-by: Alyssa Rosenzweig <alyssa at rosenzweig.io>
> ---
> src/gallium/drivers/panfrost/pan_context.c | 42 +++++++++++++++++++---
> 1 file changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c
> index 822b5a0dfef..2996a9c1e09 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -256,7 +256,28 @@ static struct bifrost_framebuffer
> panfrost_emit_mfbd(struct panfrost_context *ctx)
> {
> struct bifrost_framebuffer framebuffer = {
> - .tiler_meta = 0xf00000c600,
> + /* It is not yet clear what tiler_meta means or how it's
> + * calculated, but we can tell the lower 32-bits are a
> + * (monotonically increasing?) function of tile count and
> + * geometry complexity; I suspect it defines a memory size of
> + * some kind? for the tiler. It's really unclear at the
> + * moment... but to add to the confusion, the hardware is happy
> + * enough to accept a zero in this field, so we don't even have
> + * to worry about it right now.
*somewhere* the result of VS (or binning shader if VS is split in two)
needs to be saved for use during the per-tile rendering. Presumably
you have to give the hw a buffer of appropriate/matching size
somewhere, or with enough geometry (and too small a buffer) things
will overflow and go badly.
I guess if you exceed the size given in .tiler_meta, then hw falls
back to running VS per tile for all geometry (not just geom visible in
the tile), hence big diff in perf for large tile counts and lotsa
geometry.
It does sound a bit strange that it would depend on tile count. I'd
expect it would be a function strictly of amount of geometry (and
possibly effected by whether or not gl_PointSize is written?).. and
possibly amount of varyings if VS isn't split into two parts.
BR,
-R
> + * The byte (just after the 32-bit mark) is much more
> + * interesting. The higher nibble I've only ever seen as 0xF,
> + * but the lower one I've seen as 0x0 or 0xF, and it's not
> + * obvious what the difference is. But what -is- obvious is
> + * that when the lower nibble is zero, performance is severely
> + * degraded compared to when the lower nibble is set.
> + * Evidently, that nibble enables some sort of fast path,
> + * perhaps relating to caching or tile flush? Regardless, at
> + * this point there's no clear reason not to set it, aside from
> + * substantially increased memory requirements (of the misc_0
> + * buffer) */
> +
> + .tiler_meta = ((uint64_t) 0xff << 32) | 0x0,
>
> .width1 = MALI_POSITIVE(ctx->pipe_framebuffer.width),
> .height1 = MALI_POSITIVE(ctx->pipe_framebuffer.height),
> @@ -271,10 +292,23 @@ panfrost_emit_mfbd(struct panfrost_context *ctx)
>
> .unknown2 = 0x1f,
>
> - /* Presumably corresponds to unknown_address_X of SFBD */
> + /* Corresponds to unknown_address_X of SFBD */
> .scratchpad = ctx->scratchpad.gpu,
> .tiler_scratch_start = ctx->misc_0.gpu,
> - .tiler_scratch_middle = ctx->misc_0.gpu + /*ctx->misc_0.size*/40960, /* Size depends on the size of the framebuffer and the number of vertices */
> +
> + /* The constant added here is, like the lower word of
> + * tiler_meta, (loosely) another product of framebuffer size
> + * and geometry complexity. It must be sufficiently large for
> + * the tiler_meta fast path to work; if it's too small, there
> + * will be DATA_INVALID_FAULTs. Conversely, it must be less
> + * than the total size of misc_0, or else there's no room. It's
> + * possible this constant configures a partition between two
> + * parts of misc_0? We haven't investigated the functionality,
> + * as these buffers are internally used by the hardware
> + * (presumably by the tiler) but not seemingly touched by the driver
> + */
> +
> + .tiler_scratch_middle = ctx->misc_0.gpu + 0xf0000,
>
> .tiler_heap_start = ctx->tiler_heap.gpu,
> .tiler_heap_end = ctx->tiler_heap.gpu + ctx->tiler_heap.size,
> @@ -2696,7 +2730,7 @@ panfrost_setup_hardware(struct panfrost_context *ctx)
> screen->driver->allocate_slab(screen, &ctx->varying_mem, 16384, false, 0, 0, 0);
> screen->driver->allocate_slab(screen, &ctx->shaders, 4096, true, PAN_ALLOCATE_EXECUTE, 0, 0);
> screen->driver->allocate_slab(screen, &ctx->tiler_heap, 32768, false, PAN_ALLOCATE_GROWABLE, 1, 128);
> - screen->driver->allocate_slab(screen, &ctx->misc_0, 128, false, PAN_ALLOCATE_GROWABLE, 1, 128);
> + screen->driver->allocate_slab(screen, &ctx->misc_0, 128*128, false, PAN_ALLOCATE_GROWABLE, 1, 128);
>
> }
>
> --
> 2.20.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list