[Mesa-stable] [PATCH] egl/dri2: Add reference count for dri2_egl_display
Nicolas Boichat
drinkcat at chromium.org
Fri Jul 22 03:29:14 UTC 2016
On Thu, Jul 21, 2016 at 10:51 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> On 21 July 2016 at 01:44, Nicolas Boichat <drinkcat at chromium.org> wrote:
>> On Wed, Jul 20, 2016 at 11:52 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>> On 20 July 2016 at 15:42, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>> On 20 July 2016 at 09:26, Nicolas Boichat <drinkcat at chromium.org> wrote:
>>>>> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
>>>>> eglTerminate, followed by eglReleaseThread. A similar case is
>>>>> observed in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=69622,
>>>>> where the test calls eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>>>>>
>>>>> With the current code, dri2_dpy structure is freed on eglTerminate
>>>>> call, so the display is not initialized when eglReleaseThread calls
>>>>> MakeCurrent with NULL parameters, to unbind the context, which
>>>>> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
>>>>> either in glFlush or in a latter call.
>>>>>
>>>>> eglTerminate specifies that "If contexts or surfaces associated
>>>>> with display is current to any thread, they are not released until
>>>>> they are no longer current as a result of eglMakeCurrent."
>>>>>
>>>>> However, to properly free the current context/surface (i.e., call
>>>>> glFlush, unbindContext, driDestroyContext), we still need the
>>>>> display vtbl (and possibly an active dri dpy connection). Therefore,
>>>>> we add some reference counter to dri2_egl_display, to make sure
>>>>> the structure is kept allocated as long as it is required.
>>>>>
>>>> Looks very, just a couple of suggestions below.
>>>>
>>>>> Signed-off-by: Nicolas Boichat <drinkcat at chromium.org>
>>>>> ---
>>>>>
>>>>> Replaces https://patchwork.freedesktop.org/patch/98874/.
>>>>>
>>>>> src/egl/drivers/dri2/egl_dri2.c | 96 ++++++++++++++++++++++++++++++++---------
>>>>> src/egl/drivers/dri2/egl_dri2.h | 4 ++
>>>>> 2 files changed, 80 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
>>>>> index ac2be86..00269d3 100644
>>>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>>>> @@ -761,6 +761,14 @@ dri2_create_screen(_EGLDisplay *disp)
>>>>> static EGLBoolean
>>>>> dri2_initialize(_EGLDriver *drv, _EGLDisplay *disp)
>>>>> {
>>>>> + EGLBoolean ret = EGL_FALSE;
>>>>> + struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>>>>> +
>>>>> + if (dri2_dpy) {
>>>>> + dri2_dpy->ref_count++;
>>>>> + return EGL_TRUE;
>>>>> + }
>>>>> +
>>>> I'm not sure that reusing the dpy is what we want here. IMHO we should
>>>> either call dri2_display_release (to release existing resources) or
>>>> simply error out.
>>>>
>>> A bit more meat to it:
>>> Upper layer(s) will ensure that upon second call to eglInitialize
>>> (without a eglTerminate in between) we won't get here. Thus only case
>>> we get this is on user misuse/leak - missing explicit/implicit
>>> eglMakeCurrent(...NULL, NULL) call while having called eglTerminate.
>>
>> Yes, that's how it's intended to work, "Initialized" boolean in
>> _EGLDisplay structure protects against repeated calls (I added a
>> comment to dri2_terminate, I should add the same in dri2_initialize).
>>
>>> If we ref count we exacerbate the leak. At the same time, returning
>>> error in case of a user leak sounds silly, so dri2_display_release
>>> might be like the better option ?
>>
>> That's right, we "leak" the display connection in this case:
>> - eglMakeCurrent(context1)
>> - eglTerminate
>> - never call any EGL function
>>
>> However that's a arguably an application bug, as we _must_ keep a
>> reference to context1. Also, we still hold a reference to the display,
>> so calling eglReleaseThread, eglMakeCurrent(NULL), or eglInitialize
>> would free/reuse the display.
>>
>> To go along your lines, I first tried doing something like:
>> while (dri2_egl_display(disp))
>> dri2_display_release(disp);
>>
>> But then in this test case:
>> - eglMakeCurrent(context1)
>> - eglTerminate
>> - eglInitialize
>> - eglMakeCurrent(context2)
>>
>> context1 would permanently leak (similar to the issue we had with the
>> previous patch). And eglMakeCurrent(context2) would crash trying to
>> unbind context1 (didn't trace the exact nature of the crash, but I
>> suppose the new display is not aware of context1). I wrote a small
>> test case for this scenario:
>> https://android-review.googlesource.com/#/c/249320/1 .
>>
> That said I fully agree with all the above - we cannot do much in case
> of application bugs/user leaks.
>
>> A more sensible option might be to call dri2_make_current(NULL). But
>> IIUC, we'd need to do that on all threads, and it violates the spec
>> "If contexts or surfaces associated with display is current to any
>> thread, they are not released until they are no longer current as a
>> result of eglMakeCurrent."
>>
>> In any case, I don't think the spec clearly says that eglTerminate
>> should terminate all connections immediately: it just says
>> "eglTerminate releases resources associated with an EGL display
>> connection.", so I think that it's ok that doing
>> eglMakeCurrent(context1), eglTerminate, eglInitialize does not tear
>> down the dpy connection, and that context1 stays active.
>>
> Seems like I was off - proposed patch won't "exacerbate" the leak.
>
> On one hand, context1 is not unbound so we ought to keep it and at the
> same time leaking in the driver isn't ideal. Plus there's a chance
> that upon consecutive call to eglInitialize the platform backend will
> be setup differently, thus if the dpy is "leaked" things will go
> funny. To 'solve' the latter, we should promote from "leak dpy on user
> error" to "always leak dpy", which gets us even further than expected.
>
> That said, your patch seems like the best we can do atm. so let's go
> with it. Can you please add a small comment above the hunk in question
> - something that mentions that it causes a leak and/or why we cannot
> plug it [with dri2_display_release and/or others].
>
> Thanks a lot for the enlightening discussion :-)
Sent a v2 with added comments. Thanks to you!
Best,
More information about the mesa-stable
mailing list