[libnice] Confusion about running threads of signal callbacks

Juan Navarro juan.navarro at gmx.es
Wed May 5 18:02:32 UTC 2021


Hi,

I'm setting up a helper thread that runs a GMainContext and GMainLoop,
and this same context is provided to the NiceAgent. A /_very_/
summarized version looks like this:

     GMainContext *context;

     void main (void)
     {
         GThread *thread = g_thread_new ("MyLoop", thread_init, NULL);
         NiceAgent *agent = nice_agent_new (context,
NICE_COMPATIBILITY_RFC5245);
         g_signal_connect (agent, "new-candidate-full", G_CALLBACK
(on_new_candidate), NULL);

         // ...  Later
         nice_agent_gather_candidates (agent, 1);
     }

     void thread_init (void)
     {
         context = g_main_context_new ();
         GMainLoop *loop = g_main_loop_new (context, FALSE);

         g_main_context_acquire (context);
         g_main_context_push_thread_default (context);

         g_main_loop_run (loop);
     }

     void on_new_candidate (NiceAgent *agent, NiceCandidate *candidate,
void *data)
     {
         // This callback should be called only from our GMainContext's
thread.
         //g_assert (g_main_context_is_owner (context));

         GST_INFO ("'new-candidate-full' signal, agent: %p, context: %p,
g_thread_self: %p",
             agent, context, g_thread_self ());
     }

Some signals such as "new-candidate-full", "candidate-gathering-done",
or "component-state-changed", are connected to different callbacks. A
`nicesrc` element is also created in a GStreamer pipeline, passing to
its "agent" property the same NiceAgent object:

     src = gst_element_factory_make ("nicesrc", NULL);
     g_object_set (src, "agent", agent);


With all this set up, I'd expect that the signal callbacks are
exclusively called from the helper thread, where the NiceAgent's context
lives. However, this is only true for some of the Agent's signals. For
example, "candidate-gathering-done" and "component-state-changed" will
pass the `g_main_context_is_owner` assert, but others such as
"new-candidate-full" or "new-selected-pair-full" will not.

I'm not discarding that my understanding is totally wrong, but I really
thought that by providing a GMainContext in the NiceAgent's constructor,
all signals would be dispatched from that context's thread.

In my particular case, I'm seeing that /_most_/ of the
"new-candidate-full" signal instances get dispatched directly from the
same thread that called `nice_agent_gather_candidates`, and /_some_/
other instances come from the thread of the `nicesrc` element in the
GStreamer pipeline.

I caught this while running with ThreadSanitizer. Here is a simplified
log output with some more details...

Note the following:

* At startup (0:00:02), the helper thread is created.
* Later (0:00:39), `nice_agent_gather_candidates()` is called.
* Right afterwards, both the main and nicesrc threads are running the
signal callback function.
* main thread IDs: 0x*5bfe0, T15
* nicesrc thread IDs: 0x*1e0c0, T25

0:00:02  0x7b0c0005bfe0  New WebSocketTransport worker thread:
0x7b0c0005bfe0
0:00:39  0x7b0c0005bfe0  'new-candidate-full' signal, agent:
0x7b4800040200, context: 0x7b2c000000b0, g_thread_self: 0x7b0c0005bfe0
0:00:39  0x7b180001e0c0  'new-candidate-full' signal, agent:
0x7b4800040200, context: 0x7b2c000000b0, g_thread_self: 0x7b180001e0c0

==================
WARNING: ThreadSanitizer: data race (pid=4301)
   Read of size 1 at 0x7b1400011552 by thread T15 (mutexes: write
M375341435425461520):
     #0 closure_invoke_notifiers ../gobject/gclosure.c:294
(libgobject-2.0.so.0+0x16f67)
     #1 g_closure_invoke ../gobject/gclosure.c:816
(libgobject-2.0.so.0+0x16f67)
     #2 signal_emit_unlocked_R ../gobject/gsignal.c:3741
(libgobject-2.0.so.0+0x332aa)
     #3 g_signal_emit_valist ../gobject/gsignal.c:3497
(libgobject-2.0.so.0+0x3bb03)
     #4 g_signal_emit_by_name ../gobject/gsignal.c:3593
(libgobject-2.0.so.0+0x3c75c)
     #5 kms_ice_nice_agent_new_candidate_full
/home/ubuntu/kms-omni-build/kms-elements/src/gst-plugins/webrtcendpoint/kmsiceniceagent.c:107
(libkmswebrtcendpointlib.so.6+0x1c46f)
     #6 g_cclosure_marshal_VOID__BOXED ../gobject/gmarshal.c:1628
(libgobject-2.0.so.0+0x1bae0)
     #7 g_closure_invoke ../gobject/gclosure.c:810
(libgobject-2.0.so.0+0x16f5d)
     #8 signal_emit_unlocked_R ../gobject/gsignal.c:3741
(libgobject-2.0.so.0+0x332aa)
     #9 g_signal_emitv ../gobject/gsignal.c:3227
(libgobject-2.0.so.0+0x3a3ec)
     #10 agent_unlock_and_emit ../../agent/agent.c:217
(libnice.so.10+0xba86)
     #11 kms_ice_base_agent_start_gathering_candidates
/home/ubuntu/kms-omni-build/kms-elements/src/gst-plugins/webrtcendpoint/kmsicebaseagent.c:383
(libkmswebrtcendpointlib.so.6+0x1b860)

   Previous atomic write of size 4 at 0x7b1400011550 by thread T25
(mutexes: write M801776025142216480):
     #0 __tsan_atomic32_compare_exchange_strong <null> (libtsan.so+0x81d2c)
     #1 g_closure_invoke ../gobject/gclosure.c:797
(libgobject-2.0.so.0+0x16da9)
     #2 signal_emit_unlocked_R ../gobject/gsignal.c:3741
(libgobject-2.0.so.0+0x332aa)
     #3 g_signal_emit_valist ../gobject/gsignal.c:3497
(libgobject-2.0.so.0+0x3bb03)
     #4 g_signal_emit_by_name ../gobject/gsignal.c:3593
(libgobject-2.0.so.0+0x3c75c)
     #5 kms_ice_nice_agent_new_candidate_full
/home/ubuntu/kms-omni-build/kms-elements/src/gst-plugins/webrtcendpoint/kmsiceniceagent.c:107
(libkmswebrtcendpointlib.so.6+0x1c46f)
     #6 g_cclosure_marshal_VOID__BOXED ../gobject/gmarshal.c:1628
(libgobject-2.0.so.0+0x1bae0)
     #7 g_closure_invoke ../gobject/gclosure.c:810
(libgobject-2.0.so.0+0x16e00)
     #8 signal_emit_unlocked_R ../gobject/gsignal.c:3741
(libgobject-2.0.so.0+0x332aa)
     #9 g_signal_emitv ../gobject/gsignal.c:3227
(libgobject-2.0.so.0+0x3a3ec)
     #10 agent_unlock_and_emit ../../agent/agent.c:217
(libnice.so.10+0xba86)
     #11 g_main_dispatch ../glib/gmain.c:3337 (libglib-2.0.so.0+0x6bbf5)
     #12 g_main_context_dispatch ../glib/gmain.c:4055
(libglib-2.0.so.0+0x6bbf5)
     #13 g_main_context_iterate ../glib/gmain.c:4131
(libglib-2.0.so.0+0x6c13f)
     #14 g_main_loop_run ../glib/gmain.c:4329 (libglib-2.0.so.0+0x6c59f)
     #15 <null> <null> (libgstnice.so+0x2eb3)

   Thread T15 (tid=4322, running) created by main thread at:
     #0 pthread_create <null> (libtsan.so+0x5d445)
     #1
std::thread::_M_start_thread(std::unique_ptr<std::thread::_State,
std::default_delete<std::thread::_State> >, void (*)()) <null>
(libstdc++.so.6+0xd1124)
     #2 void std::vector<std::thread, std::allocator<std::thread>
 >::emplace_back<std::_Bind<void
(kurento::WebSocketTransport::*(kurento::WebSocketTransport*))()>
 >(std::_Bind<void
(kurento::WebSocketTransport::*(kurento::WebSocketTransport*))()>&&)
/usr/include/c++/10/bits/vector.tcc:121 (kurento-media-server+0x15a3a7)
     #3 kurento::WebSocketTransport::start()
/home/ubuntu/kms-omni-build/kurento-media-server/server/transport/websocket/WebSocketTransport.cpp:397
(kurento-media-server+0x15a3a7)
     #4 main
/home/ubuntu/kms-omni-build/kurento-media-server/server/main.cpp:257
(kurento-media-server+0xc8820)

   Thread T25 'nicesrc0:src' (tid=4334, running) created by thread T11 at:
     #0 pthread_create <null> (libtsan.so+0x5d445)
     #1 g_system_thread_new ../glib/gthread-posix.c:1323
(libglib-2.0.so.0+0xe345b)
     #2 g_thread_new_internal ../glib/gthread.c:931
(libglib-2.0.so.0+0xaaa03)
     #3 g_thread_pool_start_thread ../glib/gthreadpool.c:477
(libglib-2.0.so.0+0xab537)
     #4 g_thread_pool_push ../glib/gthreadpool.c:691
(libglib-2.0.so.0+0xabbfd)
     #5 default_push gst/gsttaskpool.c:122 (libgstreamer-1.5.so.0+0xa1d79)
     #6 kms_webrtc_bundle_connection_src_sync_state_with_parent
/home/ubuntu/kms-omni-build/kms-elements/src/gst-plugins/webrtcendpoint/kmswebrtcbundleconnection.c:109
(libkmswebrtcendpointlib.so.6+0xbce2)

SUMMARY: ThreadSanitizer: data race ../gobject/gclosure.c:294 in
closure_invoke_notifiers
==================


Some extra notes:

* ThreadSanitizer has a known issue with 3rd-party libraries that
perform their own locking, causing a lot of false positives; for this
reason the official solution is to build a custom glib version that is
also compiled against TSan, which I did. This is all running against a
custom-built glib-2.68.1 with GCC 10 and TSan.

So, here are my questions:

* Note how T15 (the main server thread) is running the signal callback
(#5 kms_ice_nice_agent_new_candidate_full) as a direct child call of
running `nice_agent_gather_candidates()` (#11
kms_ice_base_agent_start_gathering_candidates). Is this expected? If I
ran `nice_agent_gather_candidates()` from another random thread, I would
really like to get the "new-candidate" callback called only from the
helper thread, not the same thread that started the gathering. Is that
possible to do?

* Is it surprising at all that the 'nicesrc0:src' thread is also
reaching my callback function?


Kind regards,
Juan


More information about the nice mailing list