[gst-devel] Problems with ringbuffer management on systems with large audio buffers

Brian Cameron Brian.Cameron at Sun.COM
Thu Apr 13 16:10:01 CEST 2006


I'm forwarding the following comment from the 
desktop-discuss at opensolaris.org alias regarding the fact that GStreamer
audio has a lag.  Because the sunaudiosink needs to set the ringbuffer
size to 512K, this means that when you change the volume in an 
application like rhythmbox there is a several second lag.

One of our engineers is looking into the problem, and he noticed the
following about GStreamer.  He seems to raise some performance issues
and other ideas of improving the code that I wanted to share.  I'd
appreciate any comments.

Brian


-------- Original Message --------
Date: Thu, 13 Apr 2006 15:07:02 -0700
From: Bart Smaalders <bart.smaalders at sun.com>
To: Brian Cameron <Brian.Cameron at Sun.COM>
CC: desktop-discuss at opensolaris.org

Brian Cameron wrote:
> Bart:
> 
>> Solaris has a mixer in the kernel; it combines each audio stream
>> together and does the volume adjustment for each stream when
>> mixed (a very small fraction of a second before play). If the
>> volume slider on rhythmbox adjusts the level prior to writing
>> the data to the ringbuffer, we're going to get significant
>> "lag" (4 seconds on my machine).
>>
>> Adequate buffering is important in an audio application,
>> although dealing with real-time action games can be tricky
>> (getting the sound right for Quake on Solaris took a little
>> while).
>>
>> In this case, can GStreamer use a hardware volume control rather
>> than a software one?
> 
> If we wanted to invest the time to hack all GNOME audio programs
> to have a system volume control instead of an application one,
> then we could do this.  However, wouldn't it be better just to
> fix SunAudio to not require such an enormous buffer?
> 
> Brian

I guess I'm just surprised the GStreamer framework doesn't
support hardware (ok, kernel software) mixers/gain controls, though.

My guess is that what's happening is that the timeshare scheduler is
dropping the priority of the gst_task_func thread that's doing all the
work a lot; anything else that needs to run will preempt it.

Here's a typical stack trace of the thread getting all the CPU time:

pstack 1985/160
1985:   rhythmbox
-----------------  lwp# 160 / thread# 160  --------------------
  bff57e6b lwp_park (0, 0, 0)
  bff52196 cond_wait_queue (882bc40, 879bc28, 0, 0) + 3e
  bff52697 _cond_wait (882bc40, 879bc28) + 69
  bff526d7 cond_wait (882bc40, 879bc28) + 24
  bff52713 pthread_cond_wait (882bc40, 879bc28) + 1e
  bb8e3406 gst_queue_chain (87dc6c0, 86ed408) + 6ba
  bd593f78 gst_pad_chain (87dc6c0, 86ed408) + 290
  bd5945db gst_pad_push (86a1568, 86ed408) + 157
  bb92c737 gst_stream_selector_chain (87edb40, 86ed408) + 1ef
  bd593f78 gst_pad_chain (87edb40, 86ed408) + 290
  bd5844b3 gst_proxy_pad_do_chain (8359190, 86ed408) + 2f
  bd593f78 gst_pad_chain (8359190, 86ed408) + 290
  bd5945db gst_pad_push (86a10e8, 86ed408) + 157
  bb6a2b6d flump3dec_drain_avail (86a11a8, 1) + 5cd
  bb6a1e2e flump3dec_sink_chain (86a11a8, 867a408) + c2
  bd593f78 gst_pad_chain (86a11a8, 867a408) + 290
  bd5945db gst_pad_push (83c1e30, 867a408) + 157
  bb753b82 gst_id3demux_chain (86e76f8, 867a408) + 45a
  bd593f78 gst_pad_chain (86e76f8, 867a408) + 290
  bd5945db gst_pad_push (87dc0c0, 867a408) + 157
  bb8e7de5 gst_type_find_element_chain (87dc240, 867a408) + 55
  bd593f78 gst_pad_chain (87dc240, 867a408) + 290
  bd5844b3 gst_proxy_pad_do_chain (889aaa0, 867a408) + 2f
  bd593f78 gst_pad_chain (889aaa0, 867a408) + 290
  bd5945db gst_pad_push (87dc600, 867a408) + 157
  bb8ac905 gst_base_src_loop (87dc600) + 255
  bd5a9024 gst_task_func (8654b40, 85b2768) + 1f0
  beaa5560 g_thread_pool_thread_proxy (83433a8) + c0
  beaa4267 g_thread_create_proxy (87ed2d0) + 11b
  bff57bdd _thr_setup (bc5a1000) + 51
  bff57e30 _lwp_start (bc5a1000, 0, 0, 0, 0, 0)

This thread uses 4.3% of my 2Ghz Opteron.


Since ts_tick processing in the kernel is slaved to the
100 Hz clock, you might try cheating and have the above
thread do a poll(0,0, 1) after writing its payload to the
audio device.  This will cause it to be asleep when
tick processing is done, and should help keep it's priority
up, which will allow you to reduce the need for buffering.

You can see stuff like this w/ prstat -L, which provides
separate lines of output per thread (LWP).  Truss also
groks the -p pid/lwpid syntax, so when you see the following
in prstat -L:

    PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID
   1985 barts      70M   27M cpu1    35    0   0:00:05 3.9% rhythmbox/193
   1831 barts     159M   94M sleep   59    0   0:14:52 1.9% firefox-bin/1
    982 barts     124M   97M sleep   59    0   0:14:20 0.8% Xorg/1
   1196 barts      84M   34M sleep   59    0   0:00:19 0.5% gnome-panel/1

You can do

truss -p 1985/193 and see what just that thread is doing.

Some other perf comments:

The gstreamer code does a lot of debugging via gst_debug_log;
is that needed?

The implementation of g_atomic_* is very poor.

 From a few second run w/ plockstat -A -p <pid>:

Mutex block

Count     nsec Lock                         Caller
-------------------------------------------------------------------------------
  3081    58528 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_get+0x3e
   191    67791 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_exchange_and_add+0x33
   116    90832 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_compare_and_exchange+0x3f
  1761     3888 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_get+0x3e
    75    62521 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_pointer_get+0x3e
    63    64554 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_add+0x30
   152     2933 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_exchange_and_add+0x33
    96     3560 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_compare_and_exchange+0x3f
    90     3609 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_pointer_get+0x3e
     9    31446 0x85bb150
libgstcoreelements.so`gst_queue_loop+0xae
     6    24564 0x85bb150
libgstcoreelements.so`gst_queue_chain+0xaa
     6    16431
libgstreamer-0.10.so.0.4.0`.XAByp4pT9jOEWpS.__dbg_functions_mutex+0x4
libgstreamer-0.10.so.0.4.0`_gst_debug_nameof_funcptr+0x83
    26     3085 0x810b898
libglib-2.0.so.0.1000.1`g_atomic_int_add+0x30
     3    23952 0x85bb150
libgstcoreelements.so`gst_queue_chain+0x9b4
     1    20851 0x85bb150                    libc.so.1`cond_wait+0x24
     1     3043
libgstreamer-0.10.so.0.4.0`.XAByp4pT9jOEWpS.__dbg_functions_mutex+0x4
libgstreamer-0.10.so.0.4.0`_gst_debug_nameof_funcptr+0x83
#

g_atomic_int_get is really evil.


For a Solaris architecture neutral fix, G_ATOMIC_MEMORY_BARRIER
should be set to membar_consumer() as far as I can tell from the
gnome sources & comments.  In any case, all threads don't need
to share a single mutex; a portable implementation might as well be:

gpointer
g_atomic_pointer_get (gpointer *atomic)
{
   gpointer result;
   GMutex my_atomic_mutex;

   bzero(&my_atomic_mutex, sizeof (GMutex));

   g_mutex_lock (&my_atomic_mutex);
   result = *atomic;
   g_mutex_unlock (&my_atomic_mutex);

   return result;
}


- Bart

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts
_______________________________________________
desktop-discuss mailing list
desktop-discuss at opensolaris.org




More information about the gstreamer-devel mailing list