[Bug 750397] CRITICAL: Race condition in GstBus

GStreamer (GNOME Bugzilla) bugzilla at gnome.org
Fri Jun 24 07:44:16 UTC 2016


https://bugzilla.gnome.org/show_bug.cgi?id=750397

--- Comment #30 from Sebastian Dröge (slomo) <slomo at coaxion.net> ---
(In reply to Matt Gruenke from comment #29)
> (In reply to Sebastian Dröge (slomo) from comment #28)
> > I think except for GstBufferPool (bug #767979), this is all fixed now.
> 
> I disagree.  I think your diff solves nothing, as you're acquiring the mutex
> too late.  The entire problem comes down to an emergent discrepancy between
> set->control_pending and the pipe occupancy.
> 
> If a mutex were to solve this problem, you'd need to lock it before
> testing/modifying set->control_pending.  And then, what would be the point
> of set->control_pending_locked?

The point of the control_pending_locked is to actually detect this discrepancy
properly. control_pending is always correct at the time the atomic operation
happens, after that things go out of sync. So we now have additionally a locked
counter that basically is the "fill level" of the socket.

By this we make sure that always the socket contains exactly one byte when it
should, and none otherwise. Which is needed especially for the Windows part, as
there you only have an on/off switch basically.

> But, I'm not sure even that would fix the problem, so long as gstpoll.c
> contains the following:
> 
>     fcntl (control_sock[0], F_SETFL, O_NONBLOCK);
>     fcntl (control_sock[1], F_SETFL, O_NONBLOCK);
> 
> Because, even if a mutex is locked before set->control_pending is
> tested/modified, is the pipe guaranteed to be atomic?  In other words, if
> one thread reads the set->control_read_fd.fd immediately after another
> thread's write has returned, is the reader guaranteed to get the byte?  On
> all platforms that support socketpair()?  If not, no mutex would completely
> avoid the need to make it blocking.

GstPoll API requires that a read() only happens after a write() has returned,
just that GstBus ignored that condition. Now we always first wait on the socket
with poll() before doing anything else (in GstBus).

Making the socket blocking is not going to solve anything here additional to
that. And it won't work on Windows because there the signalling is inherently
non-blocking.

> However, if you simply make the control pipe block, then it solves the
> inconsistency between the pipe and the atomic counter, and you can even
> remove the mutex locking you just added.
> 
> Consider the following sequence:
> 
> 0. set->control_pending is currently 0
> 1. Writer increments set->control_pending
> 2. Reader decrements set->control_pending
> 3. Reader performs a read on set->control_read_fd.fd and blocks
> 4. Write writes set->control_write_fd.fd
> 5. Reader unblocks and returns
> 
> Consistency between set->control_pending and the pipe are maintained.  And
> without any mutex locking.

Can you give me an example sequence that fails with my patch? This one works in
both variants that are now possible due to the mutex.

As explained above, making things blocking does not solve anything that isn't
solved now already... and it won't work for Windows.

> The reason I haven't posted a patch is that I didn't modify gstpoll.c.  For
> my purposes, I just reached inside the GstBus, before we ever read it, and
> ensure that O_NONBLOCK has been cleared from its poll instance.
> 
> As I said, I've got probably thousands of pipeline years of runtime on this
> fix, and probably hundreds of thousands of pipeline setups & teardowns. 
> Given how quickly I encountered the original problem (within days and maybe
> a few hundred pipeline setups/teardows), any inadequacies of my fix
> should've become apparent, by now.

Nobody noticed that GstBus was broken like this for more than 7 years (if my
GIT archaeology was correct), and even now that we know the problem it only
happens for a few people. I don't think "it works for me" is a good argument in
this case :)



Nonetheless I think it would make sense to refactor GstPoll after this is fixed
in a minimally invasive way to have a more useful API (so that e.g. the pattern
in GstBufferPool can work reliable). That's going to be a longer term project
though as we need to ensure it actually works reliable on all platforms.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the gstreamer-bugs mailing list