[Mesa-dev] [PATCH 2/5] panfrost: Allocate shaders in their own BOs

Tue Aug 6 10:39:54 UTC 2019

On Mon, 5 Aug 2019 at 19:01, Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
>
> > +        for (unsigned c = 0; c < 4; ++c) {
> > +                struct panfrost_blend_rt *rt = &blend->rt[c];
> > +                _mesa_hash_table_u64_clear(rt->shaders, panfrost_delete_blend_shader);
> > +        }
>
> What's the implication  of the clear call? Does that recursively free
> the blend CSOs due to ralloc infrastructure?

struct panfrost_blend_shader is allocated here with memdup, so I was
indeed missing a free(). Thanks!

> > +                memcpy(final.shader.bo->cpu, shader->bo->cpu, shader->size);
>
> I'm not totally comfortable with reading off CPU-writeable GPU-mapped
> memory. We do it for scoreboarding too (which is probably a bug). But as
> discussed, this may have performance and/or security implications that
> we don't really understand. Maybe we might be better off having a purely
> CPU copy of shaders we'll need to patch?

Hmm, guess would be better to play safe in this case.

> > +                    entry->flags == flags) {
>
> This is definitely necessary, +1
>
> I'm wondering if we should be optimizing this somehow but we can sail
> that boat when we get there, I guess.

Yep.

> > @@ -2035,6 +2041,7 @@ panfrost_bind_shader_state(
> >          enum pipe_shader_type type)
> >  {
> >          struct panfrost_context *ctx = pan_context(pctx);
> > +        struct panfrost_job *job = panfrost_get_job_for_fbo(ctx);
> >
> >          ctx->shader[type] = hwcso;
> >
> > @@ -2098,6 +2105,8 @@ panfrost_bind_shader_state(
> >
> >                  shader_state->compiled = true;
> >          }
> > +
> > +        panfrost_job_add_bo(job, shader_state->bo);
>
> This doesn't strike me as quite right. Conceptually, a shader state
> could be bound on one frame and then just reused for 2000 frames in a
> row, no? I guess that's not something you'll hit in CTS / normal apps,
> but it still strikes me as something we need to pay attention to.
>
> I think we want the add_bo() call to be part of emit_for_draw().

Makes sense.

> > +        /* To maximize BO cache usage, don't allocate tiny BOs */
> > +        size = MAX2(size, 4096);
>
> FWIW I think we round up to 4k anyway elsewhere?

I think that happens only when going though panfrost_drm_allocate_slab.

> Also, how does this
> interact with shaders -- if you have a bunch of tiny shaders of 128
> bytes each, you'll be wasting 3968 bytes per shader, no? At least the
> old strategy  (with the 16MB slab) allowed subdivision with byte
> granularity.

Yeah, there's the wastage you describe, but we are currently wasting
several megs so it's a vast improvement in this regard.

For more fine-grained performance and memory usage improvements, I
think we should track these kind of regressions in CI, or we'll
struggle to consistently improve.

Thanks,

Tomeu