[Mesa-dev] [PATCH] i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug.

Jason Ekstrand jason at jlekstrand.net
Thu Jan 26 07:47:46 UTC 2017


Ouch...

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

On Wed, Jan 25, 2017 at 11:14 PM, Kenneth Graunke <kenneth at whitecape.org>
wrote:

> Applications may delete a shader program, create a new one, and bind it
> before the next draw.  With terrible luck, malloc may randomly return a
> chunk of memory for the new gl_program that happened to be the exact
> same pointer as our previously bound gl_program.  In this case, our
> logic to detect new programs in brw_upload_pipeline_state() would break:
>
>       if (brw->vertex_program != ctx->VertexProgram._Current) {
>          brw->vertex_program = ctx->VertexProgram._Current;
>          brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
>       }
>
> Because the pointer is the same, we'd think it was the same program.
> But it could be wildly different - a different stage altogether,
> different sets of resources, and so on.  This causes utter chaos.
>
> As unlikely as this seems, I believe I hit this when running a subset
> of the CTS in a loop, in a group of tests that churns through simple
> programs, deleting and rebuilding them.  Presumably malloc uses a
> bucketing cache of sorts, and so freeing up a gl_program and allocating
> a new one fairly quickly causes it to reuse that memory.
>
> The result was that brw->vertex_program->info.num_ssbos claimed the
> program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
> that there were none.  This was crazy, because the binding table is
> calculated from info.num_ssbos - the shader info appeared to change
> between shader compile time and draw time.  Careful use of watchpoints
> revealed that it was being clobbered by rzalloc's memset when building
> an entirely different program...
>
> Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
> caused us to crash out of bounds when trying to upload SSBOs, or we
> may have never discovered this heisenbug.
>
> Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
> cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
> at 64x64 in a loop with 100 iterations.
>
> Cc: "17.0 13.0 12.0" <mesa-stable at lists.freedesktop.org>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 43
> +++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c
> b/src/mesa/drivers/dri/i965/brw_program.c
> index e81f6b15c0a..673dc502ad4 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -177,6 +177,49 @@ static struct gl_program *brwNewProgram(struct
> gl_context *ctx, GLenum target,
>  static void brwDeleteProgram( struct gl_context *ctx,
>                               struct gl_program *prog )
>  {
> +   struct brw_context *brw = brw_context(ctx);
> +
> +   /* Beware!  prog's refcount has reached zero, and it's about to be
> freed.
> +    *
> +    * In brw_upload_pipeline_state(), we compare brw->foo_program to
> +    * ctx->FooProgram._Current, and flag BRW_NEW_FOO_PROGRAM if the
> +    * pointer has changed.
> +    *
> +    * We cannot leave brw->foo_program as a dangling pointer to the dead
> +    * program.  malloc() may allocate the same memory for a new
> gl_program,
> +    * causing us to see matching pointers...but totally different
> programs.
> +    *
> +    * We cannot set brw->foo_program to NULL, either.  If we've deleted
> the
> +    * active program, Mesa may set ctx->FooProgram._Current to NULL.  That
> +    * would cause us to see matching pointers (NULL == NULL), and fail to
> +    * detect that a program has changed since our last draw.
> +    *
> +    * So, set it to a bogus gl_program pointer that will never match,
> +    * causing us to properly reevaluate the state on our next draw.
> +    *
> +    * Getting this wrong causes heisenbugs which are very hard to catch,
> +    * as you need a very specific allocation pattern to hit the problem.
> +    */
> +   static const struct gl_program deleted_program;
> +
> +   if (brw->vertex_program == prog)
> +      brw->vertex_program = &deleted_program;
> +
> +   if (brw->tess_ctrl_program == prog)
> +      brw->tess_ctrl_program = &deleted_program;
> +
> +   if (brw->tess_eval_program == prog)
> +      brw->tess_eval_program = &deleted_program;
> +
> +   if (brw->geometry_program == prog)
> +      brw->geometry_program = &deleted_program;
> +
> +   if (brw->fragment_program == prog)
> +      brw->fragment_program = &deleted_program;
> +
> +   if (brw->compute_program == prog)
> +      brw->compute_program = &deleted_program;
> +
>     _mesa_delete_program( ctx, prog );
>  }
>
> --
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170125/0d2c821f/attachment.html>


More information about the mesa-dev mailing list