[Mesa-dev] [PATCH 5/7] u_queue: add a futex-based implementation of fences

Nicolai Hähnle nicolai.haehnle at amd.com
Mon Oct 23 13:04:51 UTC 2017


On 23.10.2017 13:50, Grazvydas Ignotas wrote:
> On Sun, Oct 22, 2017 at 9:33 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).
>>
>> Signaling a fence is a single atomic operation in the fast case plus a
>> syscall in the slow case.
>>
>> Testing if a fence is signaled is the same as before (a simple comparison),
>> but waiting on a fence is now no more expensive than just testing it in
>> the fast (already signaled) case.
>> ---
>>   src/util/futex.h   |  4 +++
>>   src/util/u_queue.c |  2 ++
>>   src/util/u_queue.h | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 94 insertions(+)
>>
>> ...
>>
>> diff --git a/src/util/u_queue.h b/src/util/u_queue.h
>> index a3e12260e30..3d9f19f4e6c 100644
>> --- a/src/util/u_queue.h
>> +++ b/src/util/u_queue.h
>> @@ -28,30 +28,117 @@
>>    *
>>    * Jobs can be added from any thread. After that, the wait call can be used
>>    * to wait for completion of the job.
>>    */
>>
>>   #ifndef U_QUEUE_H
>>   #define U_QUEUE_H
>>
>>   #include <string.h>
>>
>> +#include "util/futex.h"
>>   #include "util/list.h"
>> +#include "util/macros.h"
>>   #include "util/u_thread.h"
>>
>>   #ifdef __cplusplus
>>   extern "C" {
>>   #endif
>>
>>   #define UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY      (1 << 0)
>>   #define UTIL_QUEUE_INIT_RESIZE_IF_FULL            (1 << 1)
>>
>> +#if defined(__GNUC__) && defined(HAVE_FUTEX)
>> +#define UTIL_QUEUE_FENCE_FUTEX
>> +#else
>> +#define UTIL_QUEUE_FENCE_STANDARD
>> +#endif
>> +
>> +#ifdef UTIL_QUEUE_FENCE_FUTEX
>> +/* Job completion fence.
>> + * Put this into your job structure.
>> + */
>> +struct util_queue_fence {
>> +   /* The fence can be in one of three states:
>> +    *  0 - signaled
>> +    *  1 - unsignaled
>> +    *  2 - unsignaled, may have waiters
>> +    */
>> +   uint32_t val;
>> +};
>> +
>> +static inline void
>> +util_queue_fence_init(struct util_queue_fence *fence)
>> +{
>> +   fence->val = 0;
>> +}
>> +
>> +static inline void
>> +util_queue_fence_destroy(struct util_queue_fence *fence)
>> +{
>> +   assert(fence->val == 0);
>> +   /* no-op */
>> +}
>> +
>> +static inline void
>> +util_queue_fence_wait(struct util_queue_fence *fence)
>> +{
>> +   uint32_t v = fence->val;
>> +
>> +   if (likely(v == 0))
>> +      return;
>> +
>> +   do {
>> +      if (v != 2)
>> +         v = __sync_val_compare_and_swap(&fence->val, 1, 2);
>> +
>> +      futex_wait(&fence->val, 2);
>> +      v = fence->val;
>> +   } while(v != 0);
>> +}
>> +
>> +static inline void
>> +util_queue_fence_signal(struct util_queue_fence *fence)
>> +{
>> +   uint32_t val = __sync_lock_test_and_set(&fence->val, 0);
> 
> As this is _signal(), don't you need a full barrier here?

You're right. It's a bit surprising that __sync_lock_test_and_set isn't one.


>> +
>> +   assert(val != 0);
>> +
>> +   if (val == 2)
> 
> The documentation says some architectures may only support a constant
> of 1 here...

Again, surprising, but you're right.

I believe both of these could be addressed by using 
__sync_val_compare_and_swap instead, right?

Or we could move to new-style gcc atomic built-ins, but I'm not sure 
off-hand how widely available those are.

Thanks,
Nicolai

> 
>> +      futex_wake_all(&fence->val);
>> +}
> 
> Gražvydas
> 



More information about the mesa-dev mailing list