[Mesa-dev] [PATCH v4 0/3] asynchronous pbo transfer with glthread

Sat Apr 15 08:08:37 UTC 2017

On Sat, 15 Apr 2017 00:50:15 +0200
Dieter Nützel <Dieter at nuetzel-hh.de> wrote:

> Am 14.04.2017 07:53, schrieb gregory hainaut:
> > On Fri, 14 Apr 2017 05:20:38 +0200
> > Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
> > 
> >> Am 14.04.2017 02:06, schrieb Dieter Nützel:
> >> > Hello Gregory,
> >> >
> >> > have you tested this with Mesa-demos/tests/pbo 'b' (benchmark)?
> >> > It result in crazy numbers and do not 'return' (one core stays @ 100%).
> >> 
> >> This is related to 'mesa_glthread=true'.
> >> If I disable (unset) it, all is fine after 'b' benchmark and 'pbo' 
> >> exit
> >> with ESC as expeted.
> >> Crazy numbers stay, maybe counter overrun due to BIG numbers? ;-)
> >> 
> >> Hope that helps.
> >> 
> >> Dieter
> > 
> > Hello Dieter,
> > 
> > I tested the demo. There is a pseudo unrelated bug on the exit of the
> > application.
> > 
> > Mesa 17.1.0-devel implementation error: In _mesa_DeleteHashTable,
> > found non-freed data
> > 
> > I will add a call to a _mesa_HashDeleteAll to fix it.
> > i.e. _mesa_HashDeleteAll(shared->ShadowBufferObjects, dummy_cb, ctx);
> > 
> > Now let's go back to the test behavior. The benchmarks will send 4s of
> > asynchronous PBO transfer commands. And then will sync gl_thread which
> > mean the application thread will be blocked until all PBO transfers are
> > done. Gl_thread is faster to dispatch command so you will need to wait
> > more before the thread goes back to real life.
> > 
> > On my side, I need to wait around 45 seconds for 6 millions of 
> > commands.
> > Result:  6,440,627 reads (gl thread on + PBO patches)
> > Result:    274,960 reads (gl thread off)
> > 
> > In your case, "Result:  77,444,412 reads", I hope you're patient.
> > I think you must wait at least 10 minutes.
> 
> Now, I was patient...
> Tried 2 times but after ~20 minutes I've killed it at first and attached 
> gdb at it during second run.
> 
> 0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00007fbda5359453 in ?? () from /usr/local/lib/dri/r600_dri.so
> #2  0x00007fbda53661f4 in ?? () from /usr/local/lib/dri/r600_dri.so
> #3  0x0000000000401e18 in ?? ()
> #4  0x00000000004028c7 in ?? ()
> #5  0x00007fbda9925781 in fghRedrawWindow () from 
> /usr/lib64/libglut.so.3
> #6  0x00007fbda9925c08 in ?? () from /usr/lib64/libglut.so.3
> #7  0x00007fbda9926cf9 in fgEnumWindows () from /usr/lib64/libglut.so.3
> #8  0x00007fbda9925ce4 in glutMainLoopEvent () from 
> /usr/lib64/libglut.so.3
> #9  0x00007fbda9925d85 in glutMainLoop () from /usr/lib64/libglut.so.3
> #10 0x00000000004019fc in ?? ()
> #11 0x00007fbda957e541 in __libc_start_main () from /lib64/libc.so.6
> #12 0x0000000000401afa in ?? ()
> 
> Should I do more or not worth it?
> 
> Dieter

Hello Dieter,

To be honest, I don't konw how much time you need to wait. 77 millions of
PBO transfer is quite huge. It depends on CPU/Memory/PCIe/VRAM/GPU speed.

Hum based on the image size (194*188*4), you need to approximately transfer
10522 GB of data from your GPU... Which is likely around 20 minutes if
PCIe run at full speed. Honestly I will let the application in background
for a couple of hours.

Backtrace without symbol is hard to read. But I'm pretty sure, it is
waiting on the glError call.

Cheers,
Gregory

> >> > mesa-demos/tests> ./pbo
> >> > ATTENTION: default value of option mesa_glthread overridden by
> >> > environment.
> >> > GL_VERSION = 4.1 Mesa 17.1.0-devel (git-7c8fe31e1c)
> >> > GL_RENDERER = Gallium 0.4 on AMD TURKS (DRM 2.49.0 /
> >> > 4.11.0-rc6-1.g5a51416-default, LLVM 5.0.0)
> >> > Loaded 194 by 188 image
> >> > Converting RGB image to RGBA
> >> > Benchmarking...
> >> > Result:  77444412 reads in 4.000000 seconds = -383971576.000000
> >> > pixels/sec
> >> >
> >> > top - 02:04:42 up 10:05,  4 users,  load average: 1,03, 0,77, 0,71
> >> > Tasks: 265 total,   1 running, 264 sleeping,   0 stopped,   0 zombie
> >> > %Cpu0  :  1,3 us,  0,3 sy,  0,0 ni, 98,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu1  :  1,3 us,  0,3 sy,  0,0 ni, 98,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu2  :  1,7 us,  0,0 sy,  0,0 ni, 98,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu3  :  2,3 us,  0,3 sy,  0,0 ni, 97,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu4  :  1,7 us,  0,3 sy,  0,0 ni, 98,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu5  : 98,3 us,  1,7 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu6  :  2,0 us,  0,3 sy,  0,0 ni, 97,7 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > %Cpu7  :  1,7 us,  0,0 sy,  0,0 ni, 98,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> >> >  0,0 st
> >> > KiB Mem : 24680300 total,  8155356 free,  5751864 used, 10773080
> >> > buff/cache
> >> > KiB Swap:        0 total,        0 free,        0 used. 18437888 avail
> >> > Mem
> >> >
> >> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> >> > COMMAND
> >> > 19380 dieter    20   0 3259764 2,911g  22472 S 100,3 12,37   2:28.48
> >> > pbo
> >> > 27937 dieter    20   0 4029572 570236 166116 S 5,980 2,310   9:45.53
> >> > konqueror
> >> > 13432 dieter    20   0 1922820 269892 129152 S 5,648 1,094   4:33.80
> >> > Web Content
> >> >
> >> > Other than that:
> >> >
> >> > For the series:
> >> >
> >> > Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
> >> > r600g, Turks XT (6670)
> >> >
> >> > Dieter
> >> >
> >> > Am 13.04.2017 19:32, schrieb Gregory Hainaut:
> >> >> Hello,
> >> >>
> >> >> Please find a new version to handle invalid buffer handles.
> >> >>
> >> >> Allow to handle this kind of case:
> >> >>    genBuffer(&pbo);
> >> >>    BindBuffer(pbo)
> >> >>    DeleteBuffer(pbo);
> >> >>    BindBuffer(rand_pbo)
> >> >>    TexSubImage2D(user_memory_pointer); // Data transfer will be
> >> >> synchronous
> >> >>
> >> >> There are various subtely to handle multi threaded shared context. In
> >> >> order to
> >> >> keep the code sane, I've considered a buffer invalid when it is
> >> >> deleted by a
> >> >> context even it is still bound to others contexts. It will force a
> >> >> synchronous
> >> >> transfer which is always safe.
> >> >>
> >> >> An example could be
> >> >>    Ctx A: glGenBuffers(1, &pbo);
> >> >>    Ctx A: glBindBuffer(PIXEL_UNPACK_BUFFER, pbo);
> >> >>    Ctx B: glDeleteBuffers(1, &pbo);
> >> >>    Ctx A: glTexSubImage2D(...); // will be synchronous, even though it
> >> >>    _could_ be asynchronous (because the PBO that was generated first
> >> >> is
> >> >>    still bound!)
> >> >>
> >> >> V3: I mixed up the number so I jumped right away to v4...
> >> >> V4: improve commments based on Nicolai feedback
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Gregory Hainaut (3):
> >> >>   mesa/glthread: track buffer creation/destruction
> >> >>   mesa/glthread: add tracking of PBO binding
> >> >>   mapi/glthread: generate asynchronous code for PBO transfer
> >> >>
> >> >>  src/mapi/glapi/gen/ARB_direct_state_access.xml |  18 +--
> >> >>  src/mapi/glapi/gen/ARB_robustness.xml          |   2 +-
> >> >>  src/mapi/glapi/gen/gl_API.dtd                  |  10 +-
> >> >>  src/mapi/glapi/gen/gl_API.xml                  |  32 +++---
> >> >>  src/mapi/glapi/gen/gl_marshal.py               |  23 +++-
> >> >>  src/mapi/glapi/gen/marshal_XML.py              |  21 +++-
> >> >>  src/mesa/main/glthread.h                       |  10 ++
> >> >>  src/mesa/main/marshal.c                        | 149
> >> >> ++++++++++++++++++++++++-
> >> >>  src/mesa/main/marshal.h                        |  24 ++++
> >> >>  src/mesa/main/mtypes.h                         |   5 +
> >> >>  src/mesa/main/shared.c                         |   4 +
> >> >>  11 files changed, 259 insertions(+), 39 deletions(-)
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev