[Mesa-dev] [PATCH v4 0/3] asynchronous pbo transfer with glthread

Tue Apr 18 08:43:14 UTC 2017

Hello Michel,

As yes, I completely forgot about XInitThreads that must be it. I
don't know how Nvidia manage to solve/force it. Anyway, I will fix my
application.

Thanks you for the info.

On 4/18/17, Michel Dänzer <michel at daenzer.net> wrote:
> On 18/04/17 05:04 PM, gregory hainaut wrote:
>> On Tue, 18 Apr 2017 08:51:24 +0200
>> gregory hainaut <gregory.hainaut at gmail.com> wrote:
>>
>>> On Mon, 17 Apr 2017 11:17:42 +0900
>>> Michel Dänzer <michel at daenzer.net> wrote:
>>>
>>>> On 15/04/17 05:08 PM, gregory hainaut wrote:
>>>>> On Sat, 15 Apr 2017 00:50:15 +0200
>>>>> Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
>>>>>
>>>>>> Am 14.04.2017 07:53, schrieb gregory hainaut:
>>>>>>> On Fri, 14 Apr 2017 05:20:38 +0200
>>>>>>> Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
>>>>>>>
>>>>>>>> Am 14.04.2017 02:06, schrieb Dieter Nützel:
>>>>>>>>> Hello Gregory,
>>>>>>>>>
>>>>>>>>> have you tested this with Mesa-demos/tests/pbo 'b' (benchmark)?
>>>>>>>>> It result in crazy numbers and do not 'return' (one core stays @
>>>>>>>>> 100%).
>>>>>>>>
>>>>>>>> This is related to 'mesa_glthread=true'.
>>>>>>>> If I disable (unset) it, all is fine after 'b' benchmark and 'pbo'
>>>>>>>> exit
>>>>>>>> with ESC as expeted.
>>>>>>>> Crazy numbers stay, maybe counter overrun due to BIG numbers? ;-)
>>>>>>>>
>>>>>>>> Hope that helps.
>>>>>>>>
>>>>>>>> Dieter
>>>>>>>
>>>>>>> Hello Dieter,
>>>>>>>
>>>>>>> I tested the demo. There is a pseudo unrelated bug on the exit of
>>>>>>> the
>>>>>>> application.
>>>>>>>
>>>>>>> Mesa 17.1.0-devel implementation error: In _mesa_DeleteHashTable,
>>>>>>> found non-freed data
>>>>>>>
>>>>>>> I will add a call to a _mesa_HashDeleteAll to fix it.
>>>>>>> i.e. _mesa_HashDeleteAll(shared->ShadowBufferObjects, dummy_cb,
>>>>>>> ctx);
>>>>>>>
>>>>>>> Now let's go back to the test behavior. The benchmarks will send 4s
>>>>>>> of
>>>>>>> asynchronous PBO transfer commands. And then will sync gl_thread
>>>>>>> which
>>>>>>> mean the application thread will be blocked until all PBO transfers
>>>>>>> are
>>>>>>> done. Gl_thread is faster to dispatch command so you will need to
>>>>>>> wait
>>>>>>> more before the thread goes back to real life.
>>>>>>>
>>>>>>> On my side, I need to wait around 45 seconds for 6 millions of
>>>>>>> commands.
>>>>>>> Result:  6,440,627 reads (gl thread on + PBO patches)
>>>>>>> Result:    274,960 reads (gl thread off)
>>>>>>>
>>>>>>> In your case, "Result:  77,444,412 reads", I hope you're patient.
>>>>>>> I think you must wait at least 10 minutes.
>>>>>>
>>>>>> Now, I was patient...
>>>>>> Tried 2 times but after ~20 minutes I've killed it at first and
>>>>>> attached
>>>>>> gdb at it during second run.
>>>>>>
>>>>>> 0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>>> /lib64/libpthread.so.0
>>>>>> (gdb) bt
>>>>>> #0  0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>>> /lib64/libpthread.so.0
>>>>>> #1  0x00007fbda5359453 in ?? () from /usr/local/lib/dri/r600_dri.so
>>>>>> #2  0x00007fbda53661f4 in ?? () from /usr/local/lib/dri/r600_dri.so
>>>>>> #3  0x0000000000401e18 in ?? ()
>>>>>> #4  0x00000000004028c7 in ?? ()
>>>>>> #5  0x00007fbda9925781 in fghRedrawWindow () from
>>>>>> /usr/lib64/libglut.so.3
>>>>>> #6  0x00007fbda9925c08 in ?? () from /usr/lib64/libglut.so.3
>>>>>> #7  0x00007fbda9926cf9 in fgEnumWindows () from
>>>>>> /usr/lib64/libglut.so.3
>>>>>> #8  0x00007fbda9925ce4 in glutMainLoopEvent () from
>>>>>> /usr/lib64/libglut.so.3
>>>>>> #9  0x00007fbda9925d85 in glutMainLoop () from
>>>>>> /usr/lib64/libglut.so.3
>>>>>> #10 0x00000000004019fc in ?? ()
>>>>>> #11 0x00007fbda957e541 in __libc_start_main () from /lib64/libc.so.6
>>>>>> #12 0x0000000000401afa in ?? ()
>>>>>>
>>>>>> Should I do more or not worth it?
>>>>>>
>>>>>> Dieter
>>>>>
>>>>> Hello Dieter,
>>>>>
>>>>> To be honest, I don't konw how much time you need to wait. 77 millions
>>>>> of
>>>>> PBO transfer is quite huge. It depends on CPU/Memory/PCIe/VRAM/GPU
>>>>> speed.
>>>>>
>>>>> Hum based on the image size (194*188*4), you need to approximately
>>>>> transfer
>>>>> 10522 GB of data from your GPU... Which is likely around 20 minutes if
>>>>> PCIe run at full speed. Honestly I will let the application in
>>>>> background
>>>>> for a couple of hours.
>>>>
>>>> Basically, the application needs to be fixed not to emit an unlimited
>>>> number of PBO transfers without doing anything which requires
>>>> synchronizing to the transfers.
>>>>
>>>>
>>>
>>> Hello Michel, Timothy, Marek
>>>
>>> Yes, I think it should limit the number of transfer to a million. And
>>> also uses fence to measure the PBO transfer.
>>>
>>>
>>> However, I have found others crashes on PCSX2 with those patches. It
>>> seems related to synchronization issue with GLX/DRI/X11. This series
>>> removes most of the gl sync for PCSX2. So any missing sync will trigger
>>> a crash. Or I got a not obvious bug in my patches.
>>>
>>>
>>> Please find a backtrace below of a crash during a draw. I manage to get a
>>> similar backtrace (i.e.
>>> same exception in _XReply/dequeue_pending_request) when I call
>>> XGetGeometry.
>>>
>>>
>>> #4  0xf61ec777 in __GI___assert_fail (assertion=0xf6122099
>>> "!xcb_xlib_unknown_req_in_deq", file=0xf6122067 "../../src/xcb_io.c",
>>> line=179, function=0xf612248d <__PRETTY_FUNCTION__.14063>
>>> "dequeue_pending_request")
>>>     at assert.c:101
>>> #5  0xf60abbcd in dequeue_pending_request (dpy=<optimized out>,
>>> req=<optimized out>) at ../../src/xcb_io.c:185
>>> #6  0xf60aca17 in _XReply (dpy=0xe8fdde80, rep=0xcd46b910, extra=6,
>>> discard=0) at ../../src/xcb_io.c:639
>>> #7  0xf3bba8df in DRI2GetBuffersWithFormat (dpy=0xe8fdde80,
>>> drawable=83886261, width=0xd8ba11e8, height=0xd8ba11ec,
>>> attachments=0xcd46ba38, count=1, outCount=0xcd46ba24) at dri2.c:485
>>> #8  0xf3bbac45 in dri2GetBuffersWithFormat (driDrawable=0xd8ba11d0,
>>> width=0xd8ba11e8, height=0xd8ba11ec, attachments=0xcd46ba38, count=1,
>>> out_count=0xcd46ba24, loaderPrivate=0xf225df10) at dri2_glx.c:894
>>> #9  0xd555e121 in dri2_drawable_get_buffers (count=<synthetic pointer>,
>>> atts=0xa15f8b20, drawable=0xa2e50a00) at dri2.c:285
>>> #10 dri2_allocate_textures (ctx=0xd8b98810, drawable=0xa2e50a00,
>>> statts=0xa15f8b20, statts_count=2) at dri2.c:480
>>> #11 0xd5557bc0 in dri_st_framebuffer_validate (stctx=0x9df20900,
>>> stfbi=0xa2e50a00, statts=0xa15f8b20, count=2, out=0xcd46bb80) at
>>> dri_drawable.c:83
>>> #12 0xd533ae8a in st_framebuffer_validate (stfb=stfb at entry=0xa15f8780,
>>> st=st at entry=0x9df20900) at state_tracker/st_manager.c:189
>>>
>>>
>>> I don't have any clue on the GLX/DRI/X11 interaction with OpenGL. If
>>> someone have any idea, feel free to share :)
>>
>> If it can help, here the backtrace from XGetGeometry which I can "easily"
>> trigger. I
>> only hit once the above trace. Note that above trace was inside glthread
>> whereas
>> XGetGeometry is from the application thread.
>>
>> #4  0xf61ec777 in __GI___assert_fail (assertion=0xf6122099
>> "!xcb_xlib_unknown_req_in_deq", file=0xf6122067 "../../src/xcb_io.c",
>> line=179, function=0xf612248d <__PRETTY_FUNCTION__.14063>
>> "dequeue_pending_request")
>>     at assert.c:101
>> #5  0xf60abbcd in dequeue_pending_request (dpy=<optimized out>,
>> req=<optimized out>) at ../../src/xcb_io.c:185
>> #6  0xf60aca17 in _XReply (dpy=0x8ad3ca80, rep=0xd637b89c, extra=6,
>> discard=1) at ../../src/xcb_io.c:639
>> #7  0xf6090a9e in XGetGeometry (dpy=0x8ad3ca80, d=83886309,
>> root=0xd637ba40, x=0xd637ba80, y=0xd637bac0, width=0xd637b980,
>> height=0xd637b940, borderWidth=0xd637b9c0, depth=0xd637ba00) at
>> ../../src/GetGeom.c:47
>> #8  0xe5d868b8 in GSWndOGL::GetClientRect (this=0xd8a6509c) at
>> ../plugins/GSdx/GSWndOGL.cpp:219
>
> So, unless the application made sure that XInitThreads was called before
> any other libX11 APIs, and all libX11 API calls in the application and
> in Mesa are guarded by XLockDisplay/XUnlockDisplay, this is invalid
> libX11 API usage, and a crash is expected.
>
> BTW, in addition to what I wrote in my other post, I think this boils
> down to: Mesa can only call any libX11 APIs from the main thread, not
> from the glthread.
>
> In some cases, an alternative might be using XCB APIs directly instead
> of libX11 APIs.
>
>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
>