<div dir="ltr"><div>Ouch...<br><br></div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 25, 2017 at 11:14 PM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Applications may delete a shader program, create a new one, and bind it<br>
before the next draw. With terrible luck, malloc may randomly return a<br>
chunk of memory for the new gl_program that happened to be the exact<br>
same pointer as our previously bound gl_program. In this case, our<br>
logic to detect new programs in brw_upload_pipeline_state() would break:<br>
<br>
if (brw->vertex_program != ctx->VertexProgram._Current) {<br>
brw->vertex_program = ctx->VertexProgram._Current;<br>
brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;<br>
}<br>
<br>
Because the pointer is the same, we'd think it was the same program.<br>
But it could be wildly different - a different stage altogether,<br>
different sets of resources, and so on. This causes utter chaos.<br>
<br>
As unlikely as this seems, I believe I hit this when running a subset<br>
of the CTS in a loop, in a group of tests that churns through simple<br>
programs, deleting and rebuilding them. Presumably malloc uses a<br>
bucketing cache of sorts, and so freeing up a gl_program and allocating<br>
a new one fairly quickly causes it to reuse that memory.<br>
<br>
The result was that brw->vertex_program->info.num_<wbr>ssbos claimed the<br>
program had SSBOs, while brw->vs.base.prog_data.<wbr>binding_table claimed<br>
that there were none. This was crazy, because the binding table is<br>
calculated from info.num_ssbos - the shader info appeared to change<br>
between shader compile time and draw time. Careful use of watchpoints<br>
revealed that it was being clobbered by rzalloc's memset when building<br>
an entirely different program...<br>
<br>
Fortunately, our 0xd0d0d0d0 canary for unused binding table entries<br>
caused us to crash out of bounds when trying to upload SSBOs, or we<br>
may have never discovered this heisenbug.<br>
<br>
Fixes crashes in GL45-CTS.compute_shader.sso-<wbr>case2 when using a hacked<br>
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5<br>
at 64x64 in a loop with 100 iterations.<br>
<br>
Cc: "17.0 13.0 12.0" <<a href="mailto:mesa-stable@lists.freedesktop.org">mesa-stable@lists.<wbr>freedesktop.org</a>><br>
Signed-off-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
---<br>
src/mesa/drivers/dri/i965/brw_<wbr>program.c | 43 ++++++++++++++++++++++++++++++<wbr>+++<br>
1 file changed, 43 insertions(+)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_program.c b/src/mesa/drivers/dri/i965/<wbr>brw_program.c<br>
index e81f6b15c0a..673dc502ad4 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>brw_program.c<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>brw_program.c<br>
@@ -177,6 +177,49 @@ static struct gl_program *brwNewProgram(struct gl_context *ctx, GLenum target,<br>
static void brwDeleteProgram( struct gl_context *ctx,<br>
struct gl_program *prog )<br>
{<br>
+ struct brw_context *brw = brw_context(ctx);<br>
+<br>
+ /* Beware! prog's refcount has reached zero, and it's about to be freed.<br>
+ *<br>
+ * In brw_upload_pipeline_state(), we compare brw->foo_program to<br>
+ * ctx->FooProgram._Current, and flag BRW_NEW_FOO_PROGRAM if the<br>
+ * pointer has changed.<br>
+ *<br>
+ * We cannot leave brw->foo_program as a dangling pointer to the dead<br>
+ * program. malloc() may allocate the same memory for a new gl_program,<br>
+ * causing us to see matching pointers...but totally different programs.<br>
+ *<br>
+ * We cannot set brw->foo_program to NULL, either. If we've deleted the<br>
+ * active program, Mesa may set ctx->FooProgram._Current to NULL. That<br>
+ * would cause us to see matching pointers (NULL == NULL), and fail to<br>
+ * detect that a program has changed since our last draw.<br>
+ *<br>
+ * So, set it to a bogus gl_program pointer that will never match,<br>
+ * causing us to properly reevaluate the state on our next draw.<br>
+ *<br>
+ * Getting this wrong causes heisenbugs which are very hard to catch,<br>
+ * as you need a very specific allocation pattern to hit the problem.<br>
+ */<br>
+ static const struct gl_program deleted_program;<br>
+<br>
+ if (brw->vertex_program == prog)<br>
+ brw->vertex_program = &deleted_program;<br>
+<br>
+ if (brw->tess_ctrl_program == prog)<br>
+ brw->tess_ctrl_program = &deleted_program;<br>
+<br>
+ if (brw->tess_eval_program == prog)<br>
+ brw->tess_eval_program = &deleted_program;<br>
+<br>
+ if (brw->geometry_program == prog)<br>
+ brw->geometry_program = &deleted_program;<br>
+<br>
+ if (brw->fragment_program == prog)<br>
+ brw->fragment_program = &deleted_program;<br>
+<br>
+ if (brw->compute_program == prog)<br>
+ brw->compute_program = &deleted_program;<br>
+<br>
_mesa_delete_program( ctx, prog );<br>
}<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
2.11.0<br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div>