[Mesa-dev] [PATCH] egl/dri2: Add reference count for dri2_egl_display

Thu Jul 21 14:51:41 UTC 2016

On 21 July 2016 at 01:44, Nicolas Boichat <drinkcat at chromium.org> wrote:
> On Wed, Jul 20, 2016 at 11:52 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> On 20 July 2016 at 15:42, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>> On 20 July 2016 at 09:26, Nicolas Boichat <drinkcat at chromium.org> wrote:
>>>> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
>>>> eglTerminate, followed by eglReleaseThread. A similar case is
>>>> observed in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=69622,
>>>> where the test calls eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>>>>
>>>> With the current code, dri2_dpy structure is freed on eglTerminate
>>>> call, so the display is not initialized when eglReleaseThread calls
>>>> MakeCurrent with NULL parameters, to unbind the context, which
>>>> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
>>>> either in glFlush or in a latter call.
>>>>
>>>> eglTerminate specifies that "If contexts or surfaces associated
>>>> with display is current to any thread, they are not released until
>>>> they are no longer current as a result of eglMakeCurrent."
>>>>
>>>> However, to properly free the current context/surface (i.e., call
>>>> glFlush, unbindContext, driDestroyContext), we still need the
>>>> display vtbl (and possibly an active dri dpy connection). Therefore,
>>>> we add some reference counter to dri2_egl_display, to make sure
>>>> the structure is kept allocated as long as it is required.
>>>>
>>> Looks very, just a couple of suggestions below.
>>>
>>>> Signed-off-by: Nicolas Boichat <drinkcat at chromium.org>
>>>> ---
>>>>
>>>> Replaces https://patchwork.freedesktop.org/patch/98874/.
>>>>
>>>>  src/egl/drivers/dri2/egl_dri2.c | 96 ++++++++++++++++++++++++++++++++---------
>>>>  src/egl/drivers/dri2/egl_dri2.h |  4 ++
>>>>  2 files changed, 80 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
>>>> index ac2be86..00269d3 100644
>>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>>> @@ -761,6 +761,14 @@ dri2_create_screen(_EGLDisplay *disp)
>>>>  static EGLBoolean
>>>>  dri2_initialize(_EGLDriver *drv, _EGLDisplay *disp)
>>>>  {
>>>> +   EGLBoolean ret = EGL_FALSE;
>>>> +   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>>>> +
>>>> +   if (dri2_dpy) {
>>>> +      dri2_dpy->ref_count++;
>>>> +      return EGL_TRUE;
>>>> +   }
>>>> +
>>> I'm not sure that reusing the dpy is what we want here. IMHO we should
>>> either call dri2_display_release (to release existing resources) or
>>> simply error out.
>>>
>> A bit more meat to it:
>> Upper layer(s) will ensure that upon second call to eglInitialize
>> (without a eglTerminate in between) we won't get here. Thus only case
>> we get this is on user misuse/leak  - missing explicit/implicit
>> eglMakeCurrent(...NULL, NULL) call while having called eglTerminate.
>
> Yes, that's how it's intended to work, "Initialized" boolean in
> _EGLDisplay structure protects against repeated calls (I added a
> comment to dri2_terminate, I should add the same in dri2_initialize).
>
>> If we ref count we exacerbate the leak. At the same time, returning
>> error in case of a user leak sounds silly, so dri2_display_release
>> might be like the better option ?
>
> That's right, we "leak" the display connection in this case:
>  - eglMakeCurrent(context1)
>  - eglTerminate
>  - never call any EGL function
>
> However that's a arguably an application bug, as we _must_ keep a
> reference to context1. Also, we still hold a reference to the display,
> so calling eglReleaseThread, eglMakeCurrent(NULL), or eglInitialize
> would free/reuse the display.
>
> To go along your lines, I first tried doing something like:
> while (dri2_egl_display(disp))
>    dri2_display_release(disp);
>
> But then in this test case:
>  - eglMakeCurrent(context1)
>  - eglTerminate
>  - eglInitialize
>  - eglMakeCurrent(context2)
>
> context1 would permanently leak (similar to the issue we had with the
> previous patch). And eglMakeCurrent(context2) would crash trying to
> unbind context1 (didn't trace the exact nature of the crash, but I
> suppose the new display is not aware of context1). I wrote a small
> test case for this scenario:
> https://android-review.googlesource.com/#/c/249320/1 .
>
That said I fully agree with all the above - we cannot do much in case
of application bugs/user leaks.

> A more sensible option might be to call dri2_make_current(NULL). But
> IIUC, we'd need to do that on all threads, and it violates the spec
> "If contexts or surfaces associated with display is current to any
> thread, they are not released until they are no longer current as a
> result of eglMakeCurrent."
>
> In any case, I don't think the spec clearly says that eglTerminate
> should terminate all connections immediately: it just says
> "eglTerminate releases resources associated with an EGL display
> connection.", so I think that it's ok that doing
> eglMakeCurrent(context1), eglTerminate, eglInitialize does not tear
> down the dpy connection, and that context1 stays active.
>
Seems like I was off - proposed patch won't "exacerbate" the leak.

On one hand, context1 is not unbound so we ought to keep it and at the
same time leaking in the driver isn't ideal. Plus there's a chance
that upon consecutive call to eglInitialize the platform backend will
be setup differently, thus if the dpy is "leaked" things will go
funny. To 'solve' the latter, we should promote from "leak dpy on user
error" to "always leak dpy", which gets us even further than expected.

That said, your patch seems like the best we can do atm. so let's go
with it. Can you please add a small comment above the hunk in question
- something that mentions that it causes a leak and/or why we cannot
plug it [with dri2_display_release and/or others].

Thanks a lot for the enlightening discussion :-)
-Emil