[Mesa-dev] [PATCH 5/7] u_queue: add a futex-based implementation of fences

Grazvydas Ignotas notasas at gmail.com
Mon Oct 23 14:55:15 UTC 2017


On Mon, Oct 23, 2017 at 4:04 PM, Nicolai Hähnle <nicolai.haehnle at amd.com> wrote:
> On 23.10.2017 13:50, Grazvydas Ignotas wrote:
>>
>> On Sun, Oct 22, 2017 at 9:33 PM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>
>>> Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).
>>>
>>> Signaling a fence is a single atomic operation in the fast case plus a
>>> syscall in the slow case.
>>>
>>> Testing if a fence is signaled is the same as before (a simple
>>> comparison),
>>> but waiting on a fence is now no more expensive than just testing it in
>>> the fast (already signaled) case.
>>> ---
>>>   src/util/futex.h   |  4 +++
>>>   src/util/u_queue.c |  2 ++
>>>   src/util/u_queue.h | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   3 files changed, 94 insertions(+)
>>>
>>> ...
>>>
>>> diff --git a/src/util/u_queue.h b/src/util/u_queue.h
>>> index a3e12260e30..3d9f19f4e6c 100644
>>> --- a/src/util/u_queue.h
>>> +++ b/src/util/u_queue.h
>>> @@ -28,30 +28,117 @@
>>>    *
>>>    * Jobs can be added from any thread. After that, the wait call can be
>>> used
>>>    * to wait for completion of the job.
>>>    */
>>>
>>>   #ifndef U_QUEUE_H
>>>   #define U_QUEUE_H
>>>
>>>   #include <string.h>
>>>
>>> +#include "util/futex.h"
>>>   #include "util/list.h"
>>> +#include "util/macros.h"
>>>   #include "util/u_thread.h"
>>>
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>>   #endif
>>>
>>>   #define UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY      (1 << 0)
>>>   #define UTIL_QUEUE_INIT_RESIZE_IF_FULL            (1 << 1)
>>>
>>> +#if defined(__GNUC__) && defined(HAVE_FUTEX)
>>> +#define UTIL_QUEUE_FENCE_FUTEX
>>> +#else
>>> +#define UTIL_QUEUE_FENCE_STANDARD
>>> +#endif
>>> +
>>> +#ifdef UTIL_QUEUE_FENCE_FUTEX
>>> +/* Job completion fence.
>>> + * Put this into your job structure.
>>> + */
>>> +struct util_queue_fence {
>>> +   /* The fence can be in one of three states:
>>> +    *  0 - signaled
>>> +    *  1 - unsignaled
>>> +    *  2 - unsignaled, may have waiters
>>> +    */
>>> +   uint32_t val;
>>> +};
>>> +
>>> +static inline void
>>> +util_queue_fence_init(struct util_queue_fence *fence)
>>> +{
>>> +   fence->val = 0;
>>> +}
>>> +
>>> +static inline void
>>> +util_queue_fence_destroy(struct util_queue_fence *fence)
>>> +{
>>> +   assert(fence->val == 0);
>>> +   /* no-op */
>>> +}
>>> +
>>> +static inline void
>>> +util_queue_fence_wait(struct util_queue_fence *fence)
>>> +{
>>> +   uint32_t v = fence->val;
>>> +
>>> +   if (likely(v == 0))
>>> +      return;
>>> +
>>> +   do {
>>> +      if (v != 2)
>>> +         v = __sync_val_compare_and_swap(&fence->val, 1, 2);
>>> +
>>> +      futex_wait(&fence->val, 2);
>>> +      v = fence->val;
>>> +   } while(v != 0);
>>> +}
>>> +
>>> +static inline void
>>> +util_queue_fence_signal(struct util_queue_fence *fence)
>>> +{
>>> +   uint32_t val = __sync_lock_test_and_set(&fence->val, 0);
>>
>>
>> As this is _signal(), don't you need a full barrier here?
>
>
> You're right. It's a bit surprising that __sync_lock_test_and_set isn't one.
>
>
>>> +
>>> +   assert(val != 0);
>>> +
>>> +   if (val == 2)
>>
>>
>> The documentation says some architectures may only support a constant
>> of 1 here...
>
>
> Again, surprising, but you're right.
>
> I believe both of these could be addressed by using
> __sync_val_compare_and_swap instead, right?

I believe so too.

> Or we could move to new-style gcc atomic built-ins, but I'm not sure
> off-hand how widely available those are.

There's u_atomic.h that already uses some __atomic_*() built-ins,
p_atomic_cmpxchg is just __sync_val_compare_and_swap but could use it
for consistency with other mesa code.

Gražvydas


More information about the mesa-dev mailing list