[Mesa-dev] [PATCH v4 0/3] asynchronous pbo transfer with glthread

Wed Apr 19 16:43:43 UTC 2017

Hello All,

I ported PCSX2 to xcb (at least the non-glx part). Crash is gone :)
So I can send the v5 with the hash delete fix.

However, Mesa might receive crash bug report when glthread is enabled
on a random app that doesn't use xcb/XinitThread properly.

Maybe it would be better to always enable the XInitThread mode by default on the X11 library.
If performance of X11 is critical, it would be better to switch to xcb anyway.

Cheers,
Gregory

On Tue, 18 Apr 2017 15:35:59 +0200
Marek Olšák <maraeo at gmail.com> wrote:

> All GL calls that might use libX11 must not be asynchronous within glthread.
> 
> Marek
> 
> On Apr 18, 2017 10:43 AM, "Gregory Hainaut" <gregory.hainaut at gmail.com>
> wrote:
> 
> Hello Michel,
> 
> As yes, I completely forgot about XInitThreads that must be it. I
> don't know how Nvidia manage to solve/force it. Anyway, I will fix my
> application.
> 
> Thanks you for the info.
> 
> On 4/18/17, Michel Dänzer <michel at daenzer.net> wrote:
> > On 18/04/17 05:04 PM, gregory hainaut wrote:
> >> On Tue, 18 Apr 2017 08:51:24 +0200
> >> gregory hainaut <gregory.hainaut at gmail.com> wrote:
> >>
> >>> On Mon, 17 Apr 2017 11:17:42 +0900
> >>> Michel Dänzer <michel at daenzer.net> wrote:
> >>>
> >>>> On 15/04/17 05:08 PM, gregory hainaut wrote:
> >>>>> On Sat, 15 Apr 2017 00:50:15 +0200
> >>>>> Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
> >>>>>
> >>>>>> Am 14.04.2017 07:53, schrieb gregory hainaut:
> >>>>>>> On Fri, 14 Apr 2017 05:20:38 +0200
> >>>>>>> Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
> >>>>>>>
> >>>>>>>> Am 14.04.2017 02:06, schrieb Dieter Nützel:
> >>>>>>>>> Hello Gregory,
> >>>>>>>>>
> >>>>>>>>> have you tested this with Mesa-demos/tests/pbo 'b' (benchmark)?
> >>>>>>>>> It result in crazy numbers and do not 'return' (one core stays @
> >>>>>>>>> 100%).
> >>>>>>>>
> >>>>>>>> This is related to 'mesa_glthread=true'.
> >>>>>>>> If I disable (unset) it, all is fine after 'b' benchmark and 'pbo'
> >>>>>>>> exit
> >>>>>>>> with ESC as expeted.
> >>>>>>>> Crazy numbers stay, maybe counter overrun due to BIG numbers? ;-)
> >>>>>>>>
> >>>>>>>> Hope that helps.
> >>>>>>>>
> >>>>>>>> Dieter
> >>>>>>>
> >>>>>>> Hello Dieter,
> >>>>>>>
> >>>>>>> I tested the demo. There is a pseudo unrelated bug on the exit of
> >>>>>>> the
> >>>>>>> application.
> >>>>>>>
> >>>>>>> Mesa 17.1.0-devel implementation error: In _mesa_DeleteHashTable,
> >>>>>>> found non-freed data
> >>>>>>>
> >>>>>>> I will add a call to a _mesa_HashDeleteAll to fix it.
> >>>>>>> i.e. _mesa_HashDeleteAll(shared->ShadowBufferObjects, dummy_cb,
> >>>>>>> ctx);
> >>>>>>>
> >>>>>>> Now let's go back to the test behavior. The benchmarks will send 4s
> >>>>>>> of
> >>>>>>> asynchronous PBO transfer commands. And then will sync gl_thread
> >>>>>>> which
> >>>>>>> mean the application thread will be blocked until all PBO transfers
> >>>>>>> are
> >>>>>>> done. Gl_thread is faster to dispatch command so you will need to
> >>>>>>> wait
> >>>>>>> more before the thread goes back to real life.
> >>>>>>>
> >>>>>>> On my side, I need to wait around 45 seconds for 6 millions of
> >>>>>>> commands.
> >>>>>>> Result:  6,440,627 reads (gl thread on + PBO patches)
> >>>>>>> Result:    274,960 reads (gl thread off)
> >>>>>>>
> >>>>>>> In your case, "Result:  77,444,412 reads", I hope you're patient.
> >>>>>>> I think you must wait at least 10 minutes.
> >>>>>>
> >>>>>> Now, I was patient...
> >>>>>> Tried 2 times but after ~20 minutes I've killed it at first and
> >>>>>> attached
> >>>>>> gdb at it during second run.
> >>>>>>
> >>>>>> 0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >>>>>> /lib64/libpthread.so.0
> >>>>>> (gdb) bt
> >>>>>> #0  0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >>>>>> /lib64/libpthread.so.0
> >>>>>> #1  0x00007fbda5359453 in ?? () from /usr/local/lib/dri/r600_dri.so
> >>>>>> #2  0x00007fbda53661f4 in ?? () from /usr/local/lib/dri/r600_dri.so
> >>>>>> #3  0x0000000000401e18 in ?? ()
> >>>>>> #4  0x00000000004028c7 in ?? ()
> >>>>>> #5  0x00007fbda9925781 in fghRedrawWindow () from
> >>>>>> /usr/lib64/libglut.so.3
> >>>>>> #6  0x00007fbda9925c08 in ?? () from /usr/lib64/libglut.so.3
> >>>>>> #7  0x00007fbda9926cf9 in fgEnumWindows () from
> >>>>>> /usr/lib64/libglut.so.3
> >>>>>> #8  0x00007fbda9925ce4 in glutMainLoopEvent () from
> >>>>>> /usr/lib64/libglut.so.3
> >>>>>> #9  0x00007fbda9925d85 in glutMainLoop () from
> >>>>>> /usr/lib64/libglut.so.3
> >>>>>> #10 0x00000000004019fc in ?? ()
> >>>>>> #11 0x00007fbda957e541 in __libc_start_main () from /lib64/libc.so.6
> >>>>>> #12 0x0000000000401afa in ?? ()
> >>>>>>
> >>>>>> Should I do more or not worth it?
> >>>>>>
> >>>>>> Dieter
> >>>>>
> >>>>> Hello Dieter,
> >>>>>
> >>>>> To be honest, I don't konw how much time you need to wait. 77 millions
> >>>>> of
> >>>>> PBO transfer is quite huge. It depends on CPU/Memory/PCIe/VRAM/GPU
> >>>>> speed.
> >>>>>
> >>>>> Hum based on the image size (194*188*4), you need to approximately
> >>>>> transfer
> >>>>> 10522 GB of data from your GPU... Which is likely around 20 minutes if
> >>>>> PCIe run at full speed. Honestly I will let the application in
> >>>>> background
> >>>>> for a couple of hours.
> >>>>
> >>>> Basically, the application needs to be fixed not to emit an unlimited
> >>>> number of PBO transfers without doing anything which requires
> >>>> synchronizing to the transfers.
> >>>>
> >>>>
> >>>
> >>> Hello Michel, Timothy, Marek
> >>>
> >>> Yes, I think it should limit the number of transfer to a million. And
> >>> also uses fence to measure the PBO transfer.
> >>>
> >>>
> >>> However, I have found others crashes on PCSX2 with those patches. It
> >>> seems related to synchronization issue with GLX/DRI/X11. This series
> >>> removes most of the gl sync for PCSX2. So any missing sync will trigger
> >>> a crash. Or I got a not obvious bug in my patches.
> >>>
> >>>
> >>> Please find a backtrace below of a crash during a draw. I manage to get
> a
> >>> similar backtrace (i.e.
> >>> same exception in _XReply/dequeue_pending_request) when I call
> >>> XGetGeometry.
> >>>
> >>>
> >>> #4  0xf61ec777 in __GI___assert_fail (assertion=0xf6122099
> >>> "!xcb_xlib_unknown_req_in_deq", file=0xf6122067 "../../src/xcb_io.c",
> >>> line=179, function=0xf612248d <__PRETTY_FUNCTION__.14063>
> >>> "dequeue_pending_request")
> >>>     at assert.c:101
> >>> #5  0xf60abbcd in dequeue_pending_request (dpy=<optimized out>,
> >>> req=<optimized out>) at ../../src/xcb_io.c:185
> >>> #6  0xf60aca17 in _XReply (dpy=0xe8fdde80, rep=0xcd46b910, extra=6,
> >>> discard=0) at ../../src/xcb_io.c:639
> >>> #7  0xf3bba8df in DRI2GetBuffersWithFormat (dpy=0xe8fdde80,
> >>> drawable=83886261, width=0xd8ba11e8, height=0xd8ba11ec,
> >>> attachments=0xcd46ba38, count=1, outCount=0xcd46ba24) at dri2.c:485
> >>> #8  0xf3bbac45 in dri2GetBuffersWithFormat (driDrawable=0xd8ba11d0,
> >>> width=0xd8ba11e8, height=0xd8ba11ec, attachments=0xcd46ba38, count=1,
> >>> out_count=0xcd46ba24, loaderPrivate=0xf225df10) at dri2_glx.c:894
> >>> #9  0xd555e121 in dri2_drawable_get_buffers (count=<synthetic pointer>,
> >>> atts=0xa15f8b20, drawable=0xa2e50a00) at dri2.c:285
> >>> #10 dri2_allocate_textures (ctx=0xd8b98810, drawable=0xa2e50a00,
> >>> statts=0xa15f8b20, statts_count=2) at dri2.c:480
> >>> #11 0xd5557bc0 in dri_st_framebuffer_validate (stctx=0x9df20900,
> >>> stfbi=0xa2e50a00, statts=0xa15f8b20, count=2, out=0xcd46bb80) at
> >>> dri_drawable.c:83
> >>> #12 0xd533ae8a in st_framebuffer_validate (stfb=stfb at entry=0xa15f8780,
> >>> st=st at entry=0x9df20900) at state_tracker/st_manager.c:189
> >>>
> >>>
> >>> I don't have any clue on the GLX/DRI/X11 interaction with OpenGL. If
> >>> someone have any idea, feel free to share :)
> >>
> >> If it can help, here the backtrace from XGetGeometry which I can "easily"
> >> trigger. I
> >> only hit once the above trace. Note that above trace was inside glthread
> >> whereas
> >> XGetGeometry is from the application thread.
> >>
> >> #4  0xf61ec777 in __GI___assert_fail (assertion=0xf6122099
> >> "!xcb_xlib_unknown_req_in_deq", file=0xf6122067 "../../src/xcb_io.c",
> >> line=179, function=0xf612248d <__PRETTY_FUNCTION__.14063>
> >> "dequeue_pending_request")
> >>     at assert.c:101
> >> #5  0xf60abbcd in dequeue_pending_request (dpy=<optimized out>,
> >> req=<optimized out>) at ../../src/xcb_io.c:185
> >> #6  0xf60aca17 in _XReply (dpy=0x8ad3ca80, rep=0xd637b89c, extra=6,
> >> discard=1) at ../../src/xcb_io.c:639
> >> #7  0xf6090a9e in XGetGeometry (dpy=0x8ad3ca80, d=83886309,
> >> root=0xd637ba40, x=0xd637ba80, y=0xd637bac0, width=0xd637b980,
> >> height=0xd637b940, borderWidth=0xd637b9c0, depth=0xd637ba00) at
> >> ../../src/GetGeom.c:47
> >> #8  0xe5d868b8 in GSWndOGL::GetClientRect (this=0xd8a6509c) at
> >> ../plugins/GSdx/GSWndOGL.cpp:219
> >
> > So, unless the application made sure that XInitThreads was called before
> > any other libX11 APIs, and all libX11 API calls in the application and
> > in Mesa are guarded by XLockDisplay/XUnlockDisplay, this is invalid
> > libX11 API usage, and a crash is expected.
> >
> > BTW, in addition to what I wrote in my other post, I think this boils
> > down to: Mesa can only call any libX11 APIs from the main thread, not
> > from the glthread.
> >
> > In some cases, an alternative might be using XCB APIs directly instead
> > of libX11 APIs.
> >
> >
> > --
> > Earthling Michel Dänzer               |               http://www.amd.com
> > Libre software enthusiast             |             Mesa and X developer
> >
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev