[pulseaudio-discuss] And now, something is wrong with the flist implementation...

David Henningsson david.henningsson at canonical.com
Fri Mar 2 04:49:42 PST 2012


Hi,

I've had my eyes on a couple of bugs filed in Ubuntu, where PulseAudio 
crashes with SIGABRT somewhere inside pa_memblock_* or pa_memblockq_*, 
often after several hours of playback. I now got a core dump from the 
bug reporter (thanks, Eric Casteleijn!) and I started analysing it.

It turns out that the same memblockq list_items is used as part of two 
memblockq chains, in essence for one of the blocks q->next->prev != q. 
How could this be? Well, they seem to be allocated from a static flist 
pool. I've never looked at flist before, but it claims to be, "A 
multiple-reader multipler-write lock-free free list implementation".

I had a look at tests/flist-test.c and decided to rewrite the test case 
to make it a bit tougher and to see if I could verify that it never 
handed out the same block twice. The test seemed to pass at first, but 
the ~10th time I ran it, the code hung. All 20 threads accessing the 
flist seemed to have stalled. I attached to the process with gdb, and 
found that l->stored, l->empty, and l->stored->next [2] were all equal, 
so clearly something is wrong.

Instead of trying to verify the algorithm, I went to Google to look for 
a reference implementation to compare against, and quickly found [1]. 
And indeed our flist looks like the one under the section "Naive 
lock-free stack which suffers from ABA problem." on that page. :-/ 
What's worse, there does not seem to be an easy fix.

At this point, it seems likely this is the root cause for the problem. 
But I could use some input on how to proceed.

One trivial idea just to add a mutex around the flist as an immediate 
workaround. It won't make the implementation lock-free, but futexes 
should be pretty fast, and the lock won't be held for long anyway. So 
this might be good enough?

I'm also noticing that in most cases where we use the flist, is as an 
optimisation to avoid using malloc, essentially pointer recycling. If 
this is the case, and the objects are not usually allocated by one 
thread and freed by another, one could have per-thread flists instead of 
one flist shared by all threads.

Any other ideas?

-- 
David Henningsson, Canonical Ltd.
http://launchpad.net/~diwic

[1] https://en.wikipedia.org/wiki/ABA_problem

[2] and l->empty->next, obviously


More information about the pulseaudio-discuss mailing list