[Mesa-dev] Could use your help with a bug involving piglit, waffle, and egl.

Paul Berry stereotype441 at gmail.com
Mon Sep 16 12:58:13 PDT 2013


(CC'ing Mesa-dev since this is a Mesa bug).

On 16 September 2013 11:00, Chad Versace <chad.versace at linux.intel.com>wrote:

> CC'ing Piglit.
>
>
> On 09/14/2013 08:54 AM, Paul Berry wrote:
>
>> I'm investigating a failure in spec/OES_fixed_point/**attribute-arrays,
>> specifically the command line
>> "/home/pberry/.platform/**piglit-mesa/piglit/build/bin/**
>> oes_fixed_point-attribute-**arrays
>> -auto -fbo".  It's segfaulting during piglit/waffle initialization due to
>> Mesa accessing freed memory.  This only started happening for me recently,
>> however I suspect that's because the access to freed memory makes it a
>> heisenbug, and the root cause has probably been around for a long time.
>>
>> What's interesting about this test is that it's a GLES1 test being run in
>> -fbo mode, which means that piglit first starts initializing things
>> assuming it's going to run the test with fbo's, then at some point it
>> figures out that it can't (because fbo's are unsupported), so it tears
>> down
>> its configuration and starts a new configuration to test using a window.
>>
>> While establishing the new configuration, waffle calls eglMakeCurrent().
>> Deep in the bowels of Mesa's implementation of this function, it decides
>> that it needs to flush the context that was previously current.  But that
>> context refers to data structures that were freed when piglit tore down
>> its
>> old configuration (specifically, it refers to brw->bufmgr, which was freed
>> in response to a call to eglTerminate()).
>>
>> I've been studying the egl calls made by piglit/waffle during this test
>> and
>> I believe they look like this (I may be missing a few but I think I found
>> most of them):
>>
>> Initial setup:
>> - eglGetDisplay()
>> - eglInitialize() (causes intel_init_bufmgr() to be called, which creates
>> bufmgr 1)
>> - eglQueryString()
>> - eglChooseConfig()
>> - eglBindAPI()
>> - eglCreateContext() (causes brwCreateContext() to be called, which
>> creates
>> context 1)
>> - eglGetConfigAttrib()
>> - eglCreateWindowSurface()
>> - eglMakeCurrent()
>> Initial teardown:
>> - eglDestroySurface()
>> - eglDestroyContext() (interestingly, does not cause intelDestroyContext
>> to
>> be called, perhaps because the context is still current?)
>> - eglTerminate() (causes intelDestroyScreen() to be called, which frees
>> bufmgr 1)
>> Second setup:
>> - eglGetDisplay()
>> - eglInitialize() (causes intel_init_bufmgr() to be called, which creates
>> bufmgr 2)
>> - eglQueryString()
>> - eglChooseConfig()
>> - eglBindAPI()
>> - eglCreateContext() (causes brwCreateContext() to be called, which
>> creates
>> context 2)
>> - eglGetConfigAttrib()
>> - eglCreateWindowSurface()
>> - eglMakeCurrent() (at this point Mesa tries to flush context 1, which
>> causes a segfault beause this causes it to try to access bufmgr 1, which
>> has already been freed)
>>
>> So, my questions are:
>> - Does it look like piglit/waffle is making an allowed sequence of EGL
>> calls?  (In other words, is the bug in Mesa or piglit/waffle?)
>> - If the bug is in Mesa, what should be happening instead?  I assume that
>> at some point Mesa should have made the current context non-current (and
>> destroyed it, perhaps), but I'm not sure when that should have happened,
>> nor what code should have been responsible for doing it.
>>
>> Thanks in advance, Chad.  I hope you're enjoying your business travel!
>>
>
> The sequence of EGL calls is legal. The bug is in Mesa. After discovering
> the bug many months ago, I posted a test to the Piglit list, but it was
> ignored and then forgotten. (Gerrit please!) I'll repost the test in the
> next day or so after rebasing it.
>
> See the comments [1] in my test to see why the sequence of EGL calls is
> legal.
>
> [1] http://cgit.freedesktop.org/~**chadversary/piglit/tree/tests/**
> egl/spec/egl-1.4/egl-**terminate-then-unbind-context.**
> c?h=egl-terminate-then-unbind#**n26<http://cgit.freedesktop.org/~chadversary/piglit/tree/tests/egl/spec/egl-1.4/egl-terminate-then-unbind-context.c?h=egl-terminate-then-unbind#n26>
>
> If I correctly understand the EGL spec quote above, the call to
> eglMakeCurrent that currently
> segfaults should instead flush the queued-to-be-destroyed context and then
> promptly destroy it.
>

Yeah, after consulation with ajax on IRC I agree.  In point of fact, this
is what eglMakeCurrent is trying to do--it's just not succeeding because
the act of flushing the context causes it to refer to the already-freed
bufmgr.

It sounds like what we need to do in Mesa is add some sort of mechanism to
make sure eglTerminate() doesn't destroy the bufmgr until after the context
using it is no longer current (e.g. reference counting).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130916/34e018d1/attachment.html>


More information about the mesa-dev mailing list