[Mesa-dev] [PATCH] egl/dri2: dri2_initialize: Do not reference-count TestOnly display

Emil Velikov emil.l.velikov at gmail.com
Tue Aug 23 13:26:48 UTC 2016


On 22 August 2016 at 07:10, Nicolas Boichat <drinkcat at chromium.org> wrote:
> Hi Emil,
>
> On Tue, Aug 16, 2016 at 1:17 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> Hi Nicolas,
>>
>> On 4 August 2016 at 02:51, Nicolas Boichat <drinkcat at chromium.org> wrote:
>>> On Thu, Aug 4, 2016 at 9:38 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>> On 04.08.2016 09:53, Nicolas Boichat wrote:
>>>>> On Thu, Aug 4, 2016 at 12:22 AM, Martin Peres
>>>>> <martin.peres at linux.intel.com> wrote:
>>>>>> On 03/08/16 16:54, Nicolas Boichat wrote:
>>>>>>>
>>>>>>> In the case where dri2_initialize is called with a TestOnly display,
>>>>>>> the display is not actually initialized, so dri2_egl_display always
>>>>>>> fails, and we cannot do any reference counting.
>>>>>>>
>>>>>>> Fixes piglit spec at egl_khr_create_context@verify gl flavor (reproducible
>>>>>>> with LIBGL_ALWAYS_SOFTWARE=1) and spec at egl_khr_fence_sync@conformance.
>>>>>>>
>>>>>>> Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
>>>>>>> Cc: "12.0" <mesa-stable at lists.freedesktop.org>
>>>>>>> Reported-by: Michel Dänzer <michel at daenzer.net>
>>>>>>> Signed-off-by: Nicolas Boichat <drinkcat at chromium.org>
>>>>>>> ---
>>>>>>>
>>>>>>> Compile-tested only, please give it a spin, thanks!
>>>>>>
>>>>>> Still crashes, same backtrace before and after the patch:
>>>>>
>>>>> Actually, I was thinking about this bug:
>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=97136, which should be
>>>>> spec at egl_khr_create_context@verify gl flavor? Did you try that test?
>>>>
>>>> Your patch fixes this test for me.
>>>>
>>>> Tested-by: Michel Dänzer <michel.daenzer at amd.com>
>>>>
>>>> Please remove the reference to the egl_khr_fence_sync test from the
>>>> commit log.
>>>
>>> Thanks.
>>>
>>> Emil: Can you fixup the commit message before applying?
>>>
>> Sure I can do that. Yet this change fixes another
>> unexpected/undocumented bug
>
> I don't think there is anything difference _before_ 9ee683f877
> (egl/dri2: Add reference count for dri2_egl_display), and after this
> patch.
>
> After 9ee683f877, of course, there is a bug, as eglGetProcAddress just
> returns NULL when called before initializing any display, which is
> clearly wrong.
>
Yes, that's correct. Yet I was thinking about the case explained below.

>> - currently calling eglGetProcAddress will
>> fail, when libEGL is built without the said platform.
>> Yet the API/documentation is clear - entry points are independent of
>> display (which is platform specific) or context.
>>
>> Did I miss something, does the above make sense ?
>
> Well, spec says "A return value of NULL indicates that the specific
> function does not exist for the EGL implementation.", so I guess it's
> fine to return NULL is EGL is built without a given platform.
>
> One possible issue is if the "default" display (TestOnly), is
> different from the display being used later... But I don't see any
> problem here, if I trace the code correctly:
> eglGetProcAddress:
>   => For many functions (egl*), returns a function pointer that is
> always valid => No problem
>   => For others (I guess mostly gl*): calls _eglGetDriverProc
>
> _eglGetDriverProc:
> - Creates a default display (which initializes _eglModules)
> - Iterates over the modules, and calls
> mod->Driver->API.GetProcAddress. I think this can only be
> dri2_get_proc_address.
>
> dri2_get_proc_address:
>  - Calls dri2_drv->get_proc_address (that's dlsym(handle,
> "_glapi_get_proc_address");, see dri2_load)
>
> _glapi_get_proc_address:
> /**
>  * Return pointer to the named function.  If the function name isn't found
>  * in the name of static functions, try generating a new API entrypoint on
>  * the fly with assembly language.
>  */
> _glapi_proc
> _glapi_get_proc_address(const char *funcName)
> {
>    const struct mapi_stub *stub = _glapi_get_stub(funcName, 1);
>    return (stub) ? (_glapi_proc) stub_get_addr(stub) : NULL;
> }
>
> Which looks platform/display independent? Or am I missing something?
>
The gist is that _eglGetNativePlatform() can return a platform which
isn't present/build in the libEGL implementation. See below for
detailed before/after:

With your patch:
 - we dive into _eglMatchDriver calling dri2_initialize (via
_eglMatchAndInitialize) with TestOnly == TRUE regardless of the
platform requested
 - thus we reach API.GetProcAddress/dri2_get_proc_address and provide
a non NULL entry point function pointer.

Before this patch (or any of your patches):
 - in _eglGetDriverProc we use eglGetDisplay(EGL_DEFAULT_DISPLAY)
 - the latter can return a dpy for a platform that is _not_ built into
the libEGL implementation (thanks to _eglGetNativePlatform)
 - thus by reusing the detected platform in dri2_initialize() we get
false and we'll never get to the API.GetProcAddress call in
_eglGetDriverProc.
 - from a user POV they'll get NULL for eglGetProcAddress because we
(sort of) miss-detect the EGL_DEFAULT_DISPLAY even if they are going
to pick/use _any_ working platform via eglGetPlatformDisplay.

As the above indicates it's quite a corner case, which ... should not
be possible to hit with the patch.
All the said, I'll double-check (build and run wise) that things don't
explode and push this.

>>>>> Not easy for me to reproduce, but... Looking that the test source code:
>>>>> https://cgit.freedesktop.org/piglit/tree/tests/egl/spec/egl_khr_fence_sync/egl_khr_fence_sync.c
>>>>>
>>>>> Do you know why we end up in the error path in init_display?
>>>>>
>>>>> My guess is that
>>>>> eglInitialize->dri2_initialize->dri2_initialize_wayland fails after
>>>>> setting disp->DriverData, so the display refcount is == 0, but the
>>>>> display is not null, leading to the crash in egl_terminate.
>>>>>
>>>>> I just spotted this patch for x11:
>>>>> https://patchwork.freedesktop.org/patch/101934/
>>>>>
>>>>> platform_wayland needs to be modified in a similar way.
>>>>
>>>> Indeed, that fixes the egl_khr_fence_sync test for me FWIW.
>>>
>>> I'll send a series to fix that very soon, on all platforms.
>>>
>>>>> For the record, Emil spotted this issue when I submitted the offending
>>>>> patch, and I haven't followed up ,-(
>>>>
>>>> For future patches, please make sure there are no piglit regressions, at
>>>> least for the tests which run with swrast via LIBGL_ALWAYS_SOFTWARE=1.
>>>
>>> Is there some simple instructions to do that?
>> Running piglit should be straight forward - please check the README
>> (and tell is it somethings is odd).
>
> Well, I didn't find ready-to-use instructions for lazy people like me
> ,-) In any case, I pasted my commands in another thread, they look
> like this:
>
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MESA_DIR/lib
> export LIBGL_DRIVERS_PATH=$MESA_DIR/lib/gallium
> export EGL_DRIVERS_PATH=$MESA_DIR/lib
> export EGL_LOG_LEVEL=debug
> export LIBGL_ALWAYS_SOFTWARE=1
> ./piglit run -p x11_egl -t "spec at egl.*" all results/master

Yes that's perfect. Most of us know them by heart since they're but
mesa (not piglit) specific and you'll not need them if you make
install.

Yet checking with out docs... there isn't an newcomer friendly place
that describes how to do your setup. If you can spare a few minutes
and write something up that'll be amazing - I'm short of ideas where
did you (and other newcomers) will expect to find such instructions.

Huge thanks again for the great work !

-Emil


More information about the mesa-dev mailing list