Hello,<br><br><div class="gmail_quote">On Wed, Dec 14, 2011 at 9:16 PM, Jaikumar Ganesh <span dir="ltr"><<a href="mailto:jaikumarg@gmail.com">jaikumarg@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Folks,<div> There is a race condition in the basic pthread mutex lock code for dbus.</div><div><br></div><div>Thread 1:</div><div> Calls _dbus_connection_acquire_io_path -> grabs the mutex -> io_path_acquired = true -> releases the mutex -> exits the function.</div>
<div><br></div><div>Thread 2: </div><div> Calls _dbus_connection_acquire_io_path -> grabs the mutex -> io_path_acquired is already true -> _dbus_condvar_wait_timeout</div><div><br></div><div>Thread 1:</div><div>
Calls _dbus_connection_release_io_path -> grabs the mutex -> sets io_path_acquired = false -> signals thread 1 and releases the mutex.</div><div> </div><div> Calls _dbus_connection_acquire_io_path again tries to grab the mutex.</div>
<div> </div><div><br></div><div>Now, in _dbus_pthread_condvar_wait_timeout:</div><div><br></div><div> result = pthread_cond_timedwait (&pcond->cond, &pmutex->lock, &end_time);</div><div> ......</div>
<div> _dbus_assert (pmutex->count == 0);</div><div> pmutex->count = old_count;</div><div> pmutex->holder = pthread_self(); </div><div><br></div><div><br></div><div>We set the holder back to Thread 2 only after some time.</div>
<div>During this time Thread 1 can grab the mutex since pthread->holder is still set to Thread1. In _dbus_pthread_mutex_lock function</div><div>that check is enough to grab the lock.</div><div><br></div><div>So currently, Thread 2 has woken up from cond_timewait -> gone back to the acquire_io_path_function and set io_path_acquired to true.</div>
<div><br></div><div>Thread 1, has been able to grab the mutex -> checks io_path_acquired variable, sees that it already true -> calls condvar_wait_timeout.</div><div><br></div><div>which asserts that the pthread->holder and pthread_self is the same. This assert will fail.</div>
<div><br></div><div>Hope the above make sense. The attached patch fixes the problem for me.</div></blockquote><div><br></div><div><br></div><div>Another way to fix this is to swap the order of the following 2 statements in <span style="background-color:rgb(255,255,255);font-family:'Lucida Console','Lucida Sans Typewriter',Monaco,monospace;font-size:11px;white-space:pre">_dbus_pthread_condvar_wait_timeout and </span></div>
<table class="GJEA35ODJI" style="width:1436px;background-image:initial;background-color:rgb(255,255,255);color:rgb(0,0,0);font-family:'Arial Unicode MS',Arial,sans-serif;font-size:15px"><tbody><tr valign="top"><td class="GJEA35ODDF GJEA35ODEF" style="font-family:'Lucida Console','Lucida Sans Typewriter',Monaco,monospace;font-size:8pt;padding-left:0px;padding-right:0px;white-space:pre;background-image:initial;border-bottom-width:1px;border-bottom-style:solid;border-bottom-color:rgb(255,255,255);padding-top:0px;padding-bottom:0px">
_dbus_pthread_condvar_wait
<span style="background-color:rgb(255,238,238)">pmutex->count = old_count;</span>
pmutex->holder = pthread_self();
The above 2 statements need to be swapped.</td></tr></tbody></table><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>Thanks</div>
</blockquote></div><br>