[waffle] [PATCH v3] wcore_error_unittest: Fix race condition leading to forever loop

Chad Versace chad.versace at linux.intel.com
Thu May 31 16:14:04 PDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/31/2012 09:49 AM, Pauli Nieminen wrote:
> Reading non-volatile value in loop without function calls allows C
> compiler to generate forever loop without checking the memory value. To
> avoid compiler optimization causing trouble take lock when reading the
> number of threads.

The commit's subject claims that the cause of the non-terminating loop is
a race condition. But the commit's message claims that the cause is an
aggressive optimization on a variable that is in fact volatile, but
not declared as volatile. These two causes are orthogonal.

> Signed-off-by: Pauli Nieminen <pauli.nieminen at linux.intel.com>
> ---
> 
> If Mac doesn't have barries then we have to stick simulating barriers. Even
> tough barrier in linux can be implemented with single atomic int and futex.
> We could probably do a pseudo barrier with same princible using mutex and
> atomic int. But who cares as long as test is quarenteed to work correctly.
> 
>  tests/unittests/waffle/core/wcore_error_unittest.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/unittests/waffle/core/wcore_error_unittest.c b/tests/unittests/waffle/core/wcore_error_unittest.c
> index 06ebb78..e9fdb94 100644
> --- a/tests/unittests/waffle/core/wcore_error_unittest.c
> +++ b/tests/unittests/waffle/core/wcore_error_unittest.c
> @@ -191,6 +191,7 @@ TEST(wcore_error, thread_local)
>      pthread_mutex_t mutex;
>      pthread_cond_t cond;
>      int num_threads_waiting = 0;
> +    int thread_count;
>  
>      pthread_t threads[NUM_THREADS];
>      struct thread_arg thread_args[NUM_THREADS];
> @@ -213,8 +214,13 @@ TEST(wcore_error, thread_local)
>  
>      // Wait for all threads to set their error codes, thus giving
>      // the threads opportunity to clobber each other's codes.
> -    while (num_threads_waiting < NUM_THREADS)
> -        ;;
> +    do {
> +        pthread_mutex_lock(&mutex); {
> +            thread_count = num_threads_waiting;
> +            pthread_mutex_unlock(&mutex);
> +        }
> +    } while (thread_count < NUM_THREADS);
> +

If the problem were a race condition, then this hunk would fix it.
But, the real problem is aggressive constant-propagation on a volatile
variable. I don't believe the hunk fixes the constant-propagation problem.
Like num_threads_waiting, thread_count is also accessed only within the
local scope, and therefore a clever optimizer can infer that thread_local
is also always 0.

If we're lucky, though, the additional activity in the neighborhood of
thread_count may sufficiently confuse the compiler and prevent
constant-propagation. But that would just be a lucky accident.

I discussed this with Kenneth Graunke and Jordan Justen, and they
came to the same conclusion.

I believe the solution is as simple as declaring num_threads_waiting as
volatile. I'll submit a patch to do that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPx/s6AAoJEAIvNt057x8i9twP/RQcXzaOTIYHMshn3Q+St6Dy
vsXs6b67p/3WuGwHWA9BSRGEJ3Cl6HujSt7HvM7qeMrNlq2K8vYHxy+R5D/HXqqI
OcSZVUPrO1hMoOVdx2n37AJsahL3mUhRh9lkT6U593XroYaugxXWD7iOCMLec16o
cCpG+HI+7lYuAeQnMJA1rAg464M9Df3mAdCBldMRGRjNCyyBhX52e1XrCoIF1H/g
GOqy9Pzw2ZFzHPrzNN+mmXYtitYzKSbzh8RktAhDdblPJyxSIpTlw/v7fEYo13v1
d//vPI/3dIS5XNYepbqIq45O9joCrpcc0z0vlhKhyXJjron2UphOVSpNB1BT6MSo
ehMVIAcZLK5781YzKwotGpuFlWP2Knh8ODxph9r1lkDHpafYae1KEAhCN1FoSajt
Bab3IKNgmOOriaNVUh/2pmzYgoAcrt/rUUwLTe1wgQxmpsag0HTHiuGpMOCYzIrE
vsmRI1P372HimHPtnqgZd1Qr1DkeS5fQmfsIIaiKypST1pOmzi8cH7Dr3YJR6BFi
VedqKPpdzAGZjgsQX0ra8z69dXUYXimdsCCfUF1t6xFqJe+tAz16i3VxLvO1bJWV
4wkOy/5kaRTClG5eI7cGqOMjyo5GfaIco98fZNiq/UGMpYiaRONLkYrqrAh5sN6l
qZFXjoj9oiUGOV8K2WrO
=xyaT
-----END PGP SIGNATURE-----


More information about the waffle mailing list