[libnice] Confusion about running threads of signal callbacks

Juan Navarro juan.navarro at gmx.es
Mon Jun 14 16:04:02 UTC 2021

Sorry, a mix of holidays and priority stuff got in the way...

On 13/05/2021 16.49, Olivier Crête wrote:
> Out of curiosity, what kind of thread races are you seeing?

The problem seems to be that the "new-candidate-full" signal gets
emitted from 2 threads at the same time: once from the main context
thread, and another from the GStreamer's nicesrc element thread. And
these collide when trying to access the gclosure that is created by the
glib signal handler (see below for location of the thread race)

This is the ThreadSanitizer stack trace that I can see whenever I set up
an N:N WebRTC session, where N is at least 4 or so. It doesn't happen
when N is lower, I guess because of the nature of the thread race:

   WARNING: ThreadSanitizer: data race (pid=11739)
     Read of size 1 at 0x7b14000af162 by thread T15 (mutexes: write M68454):
       #0 closure_invoke_notifiers ../gobject/gclosure.c:294
       #1 g_closure_invoke ../gobject/gclosure.c:816
       #2 signal_emit_unlocked_R ../gobject/gsignal.c:3741
       #3 g_signal_emitv ../gobject/gsignal.c:3227
       #4 agent_unlock_and_emit ../../agent/agent.c:217
       #5 kms_ice_base_agent_start_gathering_candidates

Here, #5 is the call to `nice_agent_gather_candidates()`, and thread T15
is the main thread.

That "read of size 1" is conflicting with the following "write of size
4", which happens from the GStreamer nicesrc element thread:

     Previous atomic write of size 4 at 0x7b14000af160 by thread T90
(mutexes: write M892129492666755424):
       #0 __tsan_atomic32_compare_exchange_strong <null>
       #1 g_closure_unref ../gobject/gclosure.c:611
       #2 g_closure_invoke ../gobject/gclosure.c:819
       #3 signal_emit_unlocked_R ../gobject/gsignal.c:3741
       #4 g_signal_emitv ../gobject/gsignal.c:3227
       #5 agent_unlock_and_emit ../../agent/agent.c:217
       #6 g_main_dispatch ../glib/gmain.c:3337 (libglib-2.0.so.0+0x6bbf5)
       #7 g_main_context_dispatch ../glib/gmain.c:4055
       #8 g_main_context_iterate ../glib/gmain.c:4131
       #9 g_main_loop_run ../glib/gmain.c:4329 (libglib-2.0.so.0+0x6c59f)
       #10 <null> <null> (libgstnice.so+0x2eb3)
       #11 g_thread_proxy ../glib/gthread.c:826 (libglib-2.0.so.0+0xaa2ef)
       #12 linux_pthread_proxy ../glib/gthread-posix.c:1269
       #13 <null> <null> (libtsan.so+0x2c34f)

I'm still trying to get the debug symbols of libgstnice.so to work
properly. Meanwhile, confirmation that T90 was the nicesrc GStreamer

     Thread T90 'nicesrc16:src' (tid=11902, running) created by thread
T12 at:
       #0 pthread_create <null> (libtsan.so+0x5d445)
       #1 g_system_thread_new ../glib/gthread-posix.c:1323
       #2 g_thread_new_internal ../glib/gthread.c:931
       #3 g_thread_pool_start_thread ../glib/gthreadpool.c:477
       #4 g_thread_pool_push ../glib/gthreadpool.c:691
       #5 default_push gst/gsttaskpool.c:122 (libgstreamer-1.5.so.0+0xa1d79)
       #6 kms_webrtc_bundle_connection_src_sync_state_with_parent

where #6 here is a call to

This is where most of my confusion comes from. Seems like if libnice was
colliding with itself from the pipeline's element?

Location of the thread race seems to be the closure that is created to
handle emission of signals

     Location is heap block of size 72 at 0x7b14000af140 allocated by
thread T9:
       #0 calloc <null> (libtsan.so+0x2f6b6)
       #1 g_malloc0 ../glib/gmem.c:136 (libglib-2.0.so.0+0x74bac)
       #2 g_closure_new_simple ../gobject/gclosure.c:218
       #3 g_cclosure_new ../gobject/gclosure.c:952
       #4 g_signal_connect_data ../gobject/gsignal.c:2601
       #5 kms_ice_nice_agent_new

where #5 is this:

   g_signal_connect (self->priv->agent, "new-candidate-full",
       G_CALLBACK (kms_ice_nice_agent_new_candidate_full), self);

so, nothing more than a signal handler.

This signal handler `kms_ice_nice_agent_new_candidate_full()` was
originally emitting a signal itself, with `g_signal_emit_by_name()`. The
thread race was then happening inside my own code, after
`agent_unlock_and_emit()`, I assume caused by `g_signal_emit_by_name()`
*NOT* being thread-safe.

So what I did (current scenario as of now) was to wrap all my signal
emissions within `g_main_context_invoke_full()`, to make sure that all
my calls to `g_signal_emit_by_name()` happen from within the same
context thread (the one that libnice constructor was passed).

This change did improve things (apparently?): my code wasn't involved
any more in the stack traces from the point where libnice wants to emit.
So, the present scenario is that my application code is not touched
between the `agent_unlock_and_emit()` and the point of the thread race.

I really wonder why the thread data race is always coming from
*nicesrc*. This element is always present in the ThreadSanitizer stack

Any idea or suggestion?

Kind regards,

More information about the nice mailing list