[Bug 110796] [REGRESSION] [BISECTED] [OpenGL CTS] race between destruction of types and shader compilation (?)

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed May 29 23:15:21 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110796

            Bug ID: 110796
           Summary: [REGRESSION] [BISECTED] [OpenGL CTS] race between
                    destruction of types and shader compilation (?)
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/DRI/i965
          Assignee: intel-3d-bugs at lists.freedesktop.org
          Reporter: agomez at igalia.com
        QA Contact: intel-3d-bugs at lists.freedesktop.org
                CC: lemody at gmail.com, t_arceri at yahoo.com.au

Created attachment 144376
  --> https://bugs.freedesktop.org/attachment.cgi?id=144376&action=edit
BT for the core running
KHR-GLES31.core.gpu_shader5.texture_gather_offsets_color

This is a hard one to explain. Let's try ...

After:

--

commit 624789e3708c87ea2a4c8d2266266b489b421cba (gitlabcom/master)
Author: Tapani Pälli <tapani.palli at intel.com>
Date:   Fri Mar 15 09:47:49 2019 +0200

    compiler/glsl: handle case where we have multiple users for types

    Both Vulkan and OpenGL might be using glsl_types simultaneously or we
    can also have multiple concurrent Vulkan instances using glsl_types.
    Patch adds a one time init to track number of users and will release
    types only when last user calls _glsl_type_singleton_decref().

    This change fixes glsl_type memory leaks we have with anv driver.

    v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
        rename helper functions (Jason)

    v3: move init, destroy to happen on GL context init and destroy

    Signed-off-by: Tapani Pälli <tapani.palli at intel.com>
    Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
    Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

--

We can hit a situation in which we try to get a glsl_type that has been already
freed, leading to a SIGSEV.

Timothy seems to have solved this very same problem for radeonsi at:

--

commit a6b7068ff5fbf4694a45a6e07adac5047e574514
Author: Timothy Arceri <tarceri at itsqueeze.com>
Date:   Tue Apr 23 12:54:38 2019 +1000

    st/mesa/radeonsi: fix race between destruction of types and shader
compilation

    Commit 624789e3708c moved the destruction of types out of atexit() and
    made use of a ref count instead. This is useful for avoiding a crash
    where drivers such as radeonsi are still compiling in a thread when the app
    exits and has not called MakeCurrent to change from the current context.

    While the above scenario is technically an app bug we shouldn't crash.
    However that change caused another race condition between the shader
    compilation tread in radeonsi and context teardown functions.

    This patch makes two changes to fix this new problem:

    First we explicitly call _mesa_destroy_shader_compiler_types() when
destroying
    the st context rather than calling it indirectly via
_mesa_free_context_data().
    We do this as we must call it after st_destroy_context_priv() so that we
don't
    destory the glsl types before the compilation threads finish.

    Next wait for the shader threads to finish in si_destroy_context() this
    also means we need to call context destroy before destroying the queues
    in si_destroy_screen().

    Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple
users for types")

    Reviewed-by: Marek Olšák <marek.olsak at amd.com>

--

Potentially, this problem is not only also present in i965 but in others (?)

--

The conditions in which I've detected this problem are a bit tricky.

While playing with bug 110357, I realized that, after the inclusion of the
offending commit above, reverting dacb11a585 was causing systematic SIGSEVs in
KBL, SKL, BDW and HSW while running cts-runner for es32 in the working branch.

**Notice that this cannot be reproduce by testing just the problematic test or
the CTS configuration in which the test exists. I've only been able to
reproduce through running with cts-runner for es32**

The "problematic" tests in which the execution could SIGSEV are:

dEQP-GLES31.functional.texture.gather.offsets.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat
KHR-GLES31.core.gpu_shader5.texture_gather_offsets_color

The SIGSEV happens while trying to invoke:

textureGatherOffsets(isampler2D, vec2, ivec2[4], int)

Checking the obtained cores, my guess is that the invokation fails because,
while checking the funtion signature to identify the proper function pointer,
it fails as it cannot find the "ivec2[4]" array type in the array types hash
table.

It cannot find it, because the hast table doesn't exist any more (?) because it
has been freed (?).

However, the signature is correct, but mesa tries to print the possible
function signature candidates. While doing so, it reaches once again to the
array types hash table for the name of the "ivec2[4]" type of the variable
defining the signature. As the hash table is bogus already, when trying to
print the name of the type, strlen leads us to a SIGSVEV.

So, it really seems like a similar race condition situation as the one Timothy
fixed for radeonsi.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20190529/a9e29690/attachment-0001.html>


More information about the intel-3d-bugs mailing list