[Mesa-dev] [PATCH 2/5] panfrost: Allocate shaders in their own BOs
Tomeu Vizoso
tomeu.vizoso at collabora.com
Tue Aug 6 10:39:54 UTC 2019
On Mon, 5 Aug 2019 at 19:01, Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
>
> > + for (unsigned c = 0; c < 4; ++c) {
> > + struct panfrost_blend_rt *rt = &blend->rt[c];
> > + _mesa_hash_table_u64_clear(rt->shaders, panfrost_delete_blend_shader);
> > + }
>
> What's the implication of the clear call? Does that recursively free
> the blend CSOs due to ralloc infrastructure?
struct panfrost_blend_shader is allocated here with memdup, so I was
indeed missing a free(). Thanks!
> > + memcpy(final.shader.bo->cpu, shader->bo->cpu, shader->size);
>
> I'm not totally comfortable with reading off CPU-writeable GPU-mapped
> memory. We do it for scoreboarding too (which is probably a bug). But as
> discussed, this may have performance and/or security implications that
> we don't really understand. Maybe we might be better off having a purely
> CPU copy of shaders we'll need to patch?
Hmm, guess would be better to play safe in this case.
> > + entry->flags == flags) {
>
> This is definitely necessary, +1
>
> I'm wondering if we should be optimizing this somehow but we can sail
> that boat when we get there, I guess.
Yep.
> > @@ -2035,6 +2041,7 @@ panfrost_bind_shader_state(
> > enum pipe_shader_type type)
> > {
> > struct panfrost_context *ctx = pan_context(pctx);
> > + struct panfrost_job *job = panfrost_get_job_for_fbo(ctx);
> >
> > ctx->shader[type] = hwcso;
> >
> > @@ -2098,6 +2105,8 @@ panfrost_bind_shader_state(
> >
> > shader_state->compiled = true;
> > }
> > +
> > + panfrost_job_add_bo(job, shader_state->bo);
>
> This doesn't strike me as quite right. Conceptually, a shader state
> could be bound on one frame and then just reused for 2000 frames in a
> row, no? I guess that's not something you'll hit in CTS / normal apps,
> but it still strikes me as something we need to pay attention to.
>
> I think we want the add_bo() call to be part of emit_for_draw().
Makes sense.
> > + /* To maximize BO cache usage, don't allocate tiny BOs */
> > + size = MAX2(size, 4096);
>
> FWIW I think we round up to 4k anyway elsewhere?
I think that happens only when going though panfrost_drm_allocate_slab.
> Also, how does this
> interact with shaders -- if you have a bunch of tiny shaders of 128
> bytes each, you'll be wasting 3968 bytes per shader, no? At least the
> old strategy (with the 16MB slab) allowed subdivision with byte
> granularity.
Yeah, there's the wastage you describe, but we are currently wasting
several megs so it's a vast improvement in this regard.
For more fine-grained performance and memory usage improvements, I
think we should track these kind of regressions in CI, or we'll
struggle to consistently improve.
Thanks,
Tomeu
More information about the mesa-dev
mailing list