[gstreamer-bugs] [Bug 554771] New: totem spins CPU 100% while allocating (infinitely) because pulseaudio process is not available

GStreamer (bugzilla.gnome.org) bugzilla-daemon at bugzilla.gnome.org
Thu Oct 2 14:41:06 PDT 2008


If you have any questions why you received this email, please see the text at
the end of this email. Replies to this email are NOT read, please see the text
at the end of this email. You can add comments to this bug at:
  http://bugzilla.gnome.org/show_bug.cgi?id=554771

  GStreamer | gst-plugins-good | Ver: 0.10.x
           Summary: totem spins CPU 100% while allocating (infinitely)
                    because pulseaudio process is not available
           Product: GStreamer
           Version: 0.10.x
          Platform: Other
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: gst-plugins-good
        AssignedTo: gstreamer-bugs at lists.sourceforge.net
        ReportedBy: mnemo at minimum.se
         QAContact: gstreamer-bugs at lists.sourceforge.net
     GNOME version: Unspecified
   GNOME milestone: Unspecified


(I wrote this bug report with a lot of "newbie" info regarding all the stuff I
did to analyze it, I'm including this info in here so that if other people find
this bug maybe someone can learn something from it)

---

Today my entire X session hung when I was playing some songs in totem. Since
the keyboard wasn't responding and the screen wasn't being updated I logged in
through ssh from another machine. Once logged in I launched "top" and there I
saw "totem" and "pulseaudio" both taking abount 1 percent CPU each constantly.
I then killed pulse audio and this for some reason caused totem to start
spinning with 100% CPU consumption while slowly also growing in VIRT size (it
was both being stuck in a loop and it was also allocating more and more
memory). At this point I launched gdb on totem to see why it was taking 100%
CPU. 

Anyway, what I did was that I ran "info threads" inside gdb to see which
threads where standing still in blocking functions like "select", "poll" and
"pthread_cond_wait" etc. In this case it turned out that only one thread was
really alive and I used "thread 3" to move to this thread in gdb. The stack I
saw in there looked like this:

#0  0x00007fb1251ddf92 in free () from /lib/libc.so.6
#1  0x00007fb12d24f189 in gst_object_get_path_string (object=<value optimized
out>) at gstobject.c:1108
#2  0x00007fb12d264de8 in gst_element_message_full (element=0x7fb10c01b000,
type=GST_MESSAGE_ERROR, domain=3398, code=1, 
    text=0x163c880 "Disconnected: Connection terminated", debug=0x0,
file=0x7fb11ca4a2a5 "pulsesink.c", 
    function=0x7fb11ca4a950 "gst_pulsesink_write", line=536) at
gstelement.c:1665
#3  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>,
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
#4  0x00007fb129ade348 in ?? () from /usr/lib/libgstaudio-0.10.so.0
#5  0x00007fb1255372a4 in ?? () from /usr/lib/libglib-2.0.so.0
#6  0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0
#7  0x00007fb125248c6d in clone () from /lib/libc.so.6
#8  0x0000000000000000 in ?? ()

It seems that clone calls start_thread which then calls some unknown function
"??" inside libglib-2.0.so.0 and that this unknown function "??" calls another
unknown function inside the library libgstaudio-0.10.so.0 Clearly this was
going to be hard with a lot of unknown functions in the stacktrace so I quit
gdb and I installed debug symbols for these libraries.

To install debug libraries I needed to figure out what package installed the
file libgstaudio-0.10.so.0 onto my system. For this I used the following
command:

dpkg -S libgstaudio-0.10.so.0

It prints that this particular file was installed through the
libgstreamer-plugins-base0.10-0 package and thus I need to find debug symbols
for this package.

While trying to debug something else a long time ago I had previous completed
step 1) from this guide (see below). And this meant that I had all the
blah-dbgsym packages available already and I could install debug symbols using
normal "apt-get install something-dbgsym". Guide how to make it possible to
install yadayada-dbgsym packages in ubuntu:
https://wiki.ubuntu.com/DebuggingProgramCrash

So anyway I just typed "sudo apt-get install libgstreamer-plugins-base" and
then I pressed TAB too see what the TAB completion options was. It printed
stuff like:
libgstreamer-plugins-base0.10-0         libgstreamer-plugins-base0.10-0-dbgsym 
libgstreamer-plugins-base0.10-dev

Clearly, the package that I was looking for was
"libgstreamer-plugins-base0.10-0-dbgsym" so I installed this package. Then I
did the same thing for libglib-2.0.so.0 which led to to install the
libglib2.0-0-dbgsym package.

By this time the VIRT size was already like 9GB and I only got 8GB so the
system started to swap to the harddrive and there was just no way I could save
a core and attach to the bugzilla report etc :-)

Anyway, I quickly launched gdb again. I looked at "info threads" and I tried to
find the thread that was running crazy in a loop. I used "thread N" to switch
to it. Now, because I had debug symbols I got this really nice stacktrace with
no unknown function names:

#0  0x00007fb1251db102 in _IO_str_overflow_internal () from /lib/libc.so.6
#1  0x00007fb1251da116 in _IO_default_xsputn_internal () from /lib/libc.so.6
#2  0x00007fb1251acb1d in vfprintf () from /lib/libc.so.6
#3  0x00007fb125261340 in __vasprintf_chk () from /lib/libc.so.6
#4  0x00007fb125542deb in IA__g_vasprintf (string=0x43dacce0, format=<value
optimized out>, args=<value optimized out>)
    at /usr/include/bits/stdio2.h:199
#5  0x00007fb12552f6e1 in IA__g_strdup_printf (format=0xfbad8000 <Address
0xfbad8000 out of bounds>)
    at /build/buildd/glib2.0-2.18.1/glib/gstrfuncs.c:218
#6  0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000,
type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, 
    debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950
"gst_pulsesink_write", line=536) at gstelement.c:1670
#7  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>,
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
#8  0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at
gstaudiosink.c:226
#9  0x00007fb1255372a4 in g_thread_create_proxy (data=0x7fb1141d3780) at
/build/buildd/glib2.0-2.18.1/glib/gthread.c:635
#10 0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0
#11 0x00007fb125248c6d in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

I also did a more in depth "bt full" and I found three stack frames that was
more interesting than the others. Look here:

#6  0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000,
type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, 
    debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950
"gst_pulsesink_write", line=536) at gstelement.c:1670
        gerror = <value optimized out>
        name = <value optimized out>
        sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated"
        sent_debug = <value optimized out>
        message = <value optimized out>
        __PRETTY_FUNCTION__ = "gst_element_message_full"
#7  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>,
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
        __txt = (gchar *) 0x163c880 "Disconnected: Connection terminated"
        __dbg = (gchar *) 0x0
        l = <value optimized out>
        pulsesink = <value optimized out>
        sum = <value optimized out>
        __FUNCTION__ = "gst_pulsesink_write"
#8  0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at
gstaudiosink.c:226
        written = 105
        left = 3528
        len = 3528
        readptr = (guint8 *) 0x7fb114148020 ""
        readseg = 4
        sink = (GstAudioSink *) 0x7fb10c01b000
        writefunc = (WriteFunc) 0x7fb11ca45ee0 <gst_pulsesink_write>
        __FUNCTION__ = "audioringbuffer_thread_func"

It looks like the program was calling gst_pulsesink_write to write some data
into pulse audio, however there was some error condition which caused a call to
gst_element_message_full. Since I just abruptly _killed_ pulseaudio while totem
was still running (and using it) so that might have something to do with it
right? :P Anyway, this is theory is confirmed by the local variables in the
stackframe for gst_pulsesink_write. Specifically it says:

sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated" 

How the question becomes: if there is a reasonable error message coming back
saying that the connection to pulseaudio has been terminated... then why would
totem just keep trying to write data to pulseaudio? Seems like pretty insane
behavior, instead totem should stop trying to send audio data to pulse and
instead it should show some error message.

Anyway, to continue debugging I put breakpoints on some of these functions to
figure out how deep in the stack the loop was. For example I used "break
function_name" put breakpoints on:

* gst_element_message_full
* gst_pulsesink_write
* audioringbuffer_thread_func

It just seemed unlikely that there should be an infinite loop bug inside
libc.so so I didn't bother with those functions. Anyway, then I resumed gdb by
typing "c" and I immediately hit a breakpoint. I continued it again and again
and basically the pattern was that both the functions gst_element_message_full
and gst_pulsesink_write were being continuously hit. However, I never hit the
breakpoint on "audioringbuffer_thread_func" so evidently this function didn't
exit.

At this point it was clear that a loop inside audioringbuffer_thread_func was
responsible for continously calling gst_pulsesink_write which in turn
continously called gst_element_message_full. Time to look at some code.

The first thing in the code I wanted to investigate was the function called
audioringbuffer_thread_func which according to my theory should have an
infinite loop. I googled for gstreamer and found it's homepage. Went to the
developer section and used cvs to fetch the code for "gstreamer". I went into
the directory created by cvs and I did "grep -r audioringbuffer_thread_func *"
which actually found nothing :-( My guess was that this code probably was
located in the plugin and from the two first stacktraces I remembered that this
function was actually in /usr/lib/libgstaudio-0.10.so.0 which meant that it was
in the libgstreamer-plugins-base0.10-0 package. Luckily gstreamer lists several
modules inside it's CVS and one is the for the base plugins. So I ended up
re-running CVS pointing it to the gst-plugins-base module and then I got a new
directory full of source code. When I re-ran the command "grep -r
audioringbuffer_thread_func *" on this code it found a match inside
gstaudiosink.c (which is of course were gdb said we'd find it even though gdb
was not smart enough to say in what folder and not in what source repository I
should look for it). Anyway, I fired up my trusty old gedit and boom there was
the code for audioringbuffer_thread_func().

After look at the code for audioringbuffer_thread_func() for a while I noted
that there wasn't any call to gst_pulsesink_write() in there. Still, gdb
clearly said that the function audioringbuffer_thread_func() had called into
the function gst_pulsesink_write(). At this point the problem could have been
that I had the wrong version of the source file because the source file might
have been changed between now and whenever ubuntu took their copy of the source
and created the package for it. If I had used the "apt-get source blah" command
I would have gotten the exact right source for the relevant package but why
make life too easy.

However, there is actually a call to writefunc() inside the function
audioringbuffer_thread_func() and a few lines above that it's seen that
writefunc is just a function pointer so it's pretty likely. That the function
pointer writefunc pointed to the function gst_pulsesink_write() when my code
was running. From looking at audioringbuffer_thread_func() it's also clear that
there is some kind of error handling in here because if the number of written
bytes is less than zero then execution will at least break out of the inner
while loop.

Okay, fair enough. What about the code inside gst_pulsesink_write()? I ran
"grep -r gst_pulsesink_write *" which returned nothing both in the
gst-plugins-base and also in gstreamer itself. Probably the pulse audio plugin
for gstreamer was not included in the main source repository nor in the base
plugins. So I went off to google for the source repository that held the
gst_pulsesink_write code and eventually managed to figure out that it's in the
gst-plugins-good package. There is probably some smarter way to figure that out
though. For this code I found that "grep -r gst_pulsesink_write *" returns a
match inside the file pulsesink.c which is exactly where gdb said I would find
it.

So, I looked at gst_pulsesink_write(). It has a while loop nested inside a for
loop. There is also some macros used, called CHECK_DEAD_GOTO and
GST_ELEMENT_ERROR.

At this point I needed to know what the definition of CHECK_DEAD_GOTO was but
the editor that I was using (gedit) does not have a right-click function for
"Go to definition" (maybe there is some gedit plugin for it, I dunno). So
instead, what I did was that I dropped into a terminal, navigated to the cvs
directory for gst-plugins-good and then I ran "cscope -R". This is a very old
but useful text mode tool for doing "Go to definition". By pressing TAB it's
possible to toggle between the upper and the lower panels. I just pressed down
arrow once so that I ended up in the "Find global definition" filed and then I
entered CHECK_DEAD_GOTO.  This shows a list of all the files where
CHECK_DEAD_GOTO is defined and I wanted the definition from pulsesink.c of
course. Now it seems that basically CHECK_DEAD_GOTO just checks if there was an
error and if there was then ut jumps to the designated label which in this case
was "unlock_and_fail". Just from the name it actually sounds like "check dead"
could be the error that you end up with when pulseaudio it dead.

So what happens if "check dead" is raised and the code jumps to
"unlock_and_fail" ?   Well, the code just unlocks the pulseaudio mainloop and
then it returns zero.

ehm, wait what?! It returns zero? Hmm, but won't that make the parent function
_continue_ calling gst_pulsesink_write again and again like there is no
tomorrow? .... smells like some kind of misunderstanding?

Okay so now we sort of understand what causes the bug. Now we need to talk to
some gstreamer dev and ask the the return code policy here actually is. What
should the gstreamer plugin return to make gstreamer understand that there is
no point in sending more data to the plugin? Looking at what the other plugins
do would also be a good starting point.

Another thing is that we didn't actually understand why X.org sort of hung in
the first place but at least we found out why totem was frozen taking 100% CPU.


-- 
See http://bugzilla.gnome.org/page.cgi?id=email.html for more info about why you received
this email, why you can't respond via email, how to stop receiving
emails (or reduce the number you receive), and how to contact someone
if you are having problems with the system.

You can add comments to this bug at http://bugzilla.gnome.org/show_bug.cgi?id=554771.




More information about the Gstreamer-bugs mailing list