[Mesa-dev] [PATCH] panfrost: Use tiler fast path (performance boost)
Connor Abbott
cwabbott0 at gmail.com
Thu Feb 21 15:34:51 UTC 2019
On Thu, Feb 21, 2019 at 3:01 PM Rob Clark <robdclark at gmail.com> wrote:
> On Thu, Feb 21, 2019 at 1:19 AM Alyssa Rosenzweig <alyssa at rosenzweig.io>
> wrote:
> >
> > For reasons that are still unclear (speculation included in the comment
> > added in this patch), the tiler? metadata has a fast path that we were
> > not enabling; there looks to be a possible time/memory tradeoff, but the
> > details remain unclear.
> >
> > Regardless, this patch improves performance dramatically. Particular
> > wins are for geometry-heavy scenes. For instance, glmark2-es2's
> > Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
> > from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
> > when vsync is disabled, as in glmark2-es2-wayland.
> >
> > With this patch, on GLES 2.0 samples not involving FBOs, it appears
> > performance is converging with (and sometimes surpassing) the blob.
> >
> > Signed-off-by: Alyssa Rosenzweig <alyssa at rosenzweig.io>
> > ---
> > src/gallium/drivers/panfrost/pan_context.c | 42 +++++++++++++++++++---
> > 1 file changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/gallium/drivers/panfrost/pan_context.c
> b/src/gallium/drivers/panfrost/pan_context.c
> > index 822b5a0dfef..2996a9c1e09 100644
> > --- a/src/gallium/drivers/panfrost/pan_context.c
> > +++ b/src/gallium/drivers/panfrost/pan_context.c
> > @@ -256,7 +256,28 @@ static struct bifrost_framebuffer
> > panfrost_emit_mfbd(struct panfrost_context *ctx)
> > {
> > struct bifrost_framebuffer framebuffer = {
> > - .tiler_meta = 0xf00000c600,
> > + /* It is not yet clear what tiler_meta means or how it's
> > + * calculated, but we can tell the lower 32-bits are a
> > + * (monotonically increasing?) function of tile count
> and
> > + * geometry complexity; I suspect it defines a memory
> size of
> > + * some kind? for the tiler. It's really unclear at the
> > + * moment... but to add to the confusion, the hardware
> is happy
> > + * enough to accept a zero in this field, so we don't
> even have
> > + * to worry about it right now.
>
> *somewhere* the result of VS (or binning shader if VS is split in two)
> needs to be saved for use during the per-tile rendering. Presumably
> you have to give the hw a buffer of appropriate/matching size
> somewhere, or with enough geometry (and too small a buffer) things
> will overflow and go badly.
>
>
Yes, there is a buffer for holding the results of the tiler. The way it
works is that the userspace driver allocates a very large buffer with a
"grow on page fault" bit, so the kernel will allocate more memory as the
tiler asks for it.
> I guess if you exceed the size given in .tiler_meta, then hw falls
> back to running VS per tile for all geometry (not just geom visible in
> the tile), hence big diff in perf for large tile counts and lotsa
> geometry.
>
No, this isn't it. The vertex shaders aren't split in half, they all run
entirely before the tiler even starts. The only thing I could imagine would
be some optional part of the tiler buffer where the aforementioned grow on
page fault thing doesn't quite work out.
>
> It does sound a bit strange that it would depend on tile count. I'd
> expect it would be a function strictly of amount of geometry (and
> possibly effected by whether or not gl_PointSize is written?).. and
> possibly amount of varyings if VS isn't split into two parts.
>
> BR,
> -R
>
> > + * The byte (just after the 32-bit mark) is much more
> > + * interesting. The higher nibble I've only ever seen
> as 0xF,
> > + * but the lower one I've seen as 0x0 or 0xF, and it's
> not
> > + * obvious what the difference is. But what -is-
> obvious is
> > + * that when the lower nibble is zero, performance is
> severely
> > + * degraded compared to when the lower nibble is set.
> > + * Evidently, that nibble enables some sort of fast
> path,
> > + * perhaps relating to caching or tile flush?
> Regardless, at
> > + * this point there's no clear reason not to set it,
> aside from
> > + * substantially increased memory requirements (of the
> misc_0
> > + * buffer) */
> > +
> > + .tiler_meta = ((uint64_t) 0xff << 32) | 0x0,
> >
> > .width1 = MALI_POSITIVE(ctx->pipe_framebuffer.width),
> > .height1 = MALI_POSITIVE(ctx->pipe_framebuffer.height),
> > @@ -271,10 +292,23 @@ panfrost_emit_mfbd(struct panfrost_context *ctx)
> >
> > .unknown2 = 0x1f,
> >
> > - /* Presumably corresponds to unknown_address_X of SFBD
> */
> > + /* Corresponds to unknown_address_X of SFBD */
> > .scratchpad = ctx->scratchpad.gpu,
> > .tiler_scratch_start = ctx->misc_0.gpu,
> > - .tiler_scratch_middle = ctx->misc_0.gpu +
> /*ctx->misc_0.size*/40960, /* Size depends on the size of the framebuffer
> and the number of vertices */
> > +
> > + /* The constant added here is, like the lower word of
> > + * tiler_meta, (loosely) another product of framebuffer
> size
> > + * and geometry complexity. It must be sufficiently
> large for
> > + * the tiler_meta fast path to work; if it's too small,
> there
> > + * will be DATA_INVALID_FAULTs. Conversely, it must be
> less
> > + * than the total size of misc_0, or else there's no
> room. It's
> > + * possible this constant configures a partition
> between two
> > + * parts of misc_0? We haven't investigated the
> functionality,
> > + * as these buffers are internally used by the hardware
> > + * (presumably by the tiler) but not seemingly touched
> by the driver
> > + */
> > +
> > + .tiler_scratch_middle = ctx->misc_0.gpu + 0xf0000,
> >
> > .tiler_heap_start = ctx->tiler_heap.gpu,
> > .tiler_heap_end = ctx->tiler_heap.gpu +
> ctx->tiler_heap.size,
> > @@ -2696,7 +2730,7 @@ panfrost_setup_hardware(struct panfrost_context
> *ctx)
> > screen->driver->allocate_slab(screen, &ctx->varying_mem, 16384,
> false, 0, 0, 0);
> > screen->driver->allocate_slab(screen, &ctx->shaders, 4096,
> true, PAN_ALLOCATE_EXECUTE, 0, 0);
> > screen->driver->allocate_slab(screen, &ctx->tiler_heap, 32768,
> false, PAN_ALLOCATE_GROWABLE, 1, 128);
> > - screen->driver->allocate_slab(screen, &ctx->misc_0, 128, false,
> PAN_ALLOCATE_GROWABLE, 1, 128);
> > + screen->driver->allocate_slab(screen, &ctx->misc_0, 128*128,
> false, PAN_ALLOCATE_GROWABLE, 1, 128);
> >
> > }
> >
> > --
> > 2.20.1
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190221/e3acbbcf/attachment.html>
More information about the mesa-dev
mailing list