[Mesa-dev] [PATCH] mesa: Add new fast mtx_t mutex type for basic use cases
Jose Fonseca
jfonseca at vmware.com
Fri Jan 30 02:52:26 PST 2015
On 29/01/15 17:14, Kristian Høgsberg wrote:
> On Thu, Jan 29, 2015 at 6:36 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> On 28/01/15 05:08, Kristian Høgsberg wrote:
>>> While modern pthread mutexes are very fast, they still incur a call to an
>>> external DSO and overhead of the generality and features of pthread mutexes.
>>> Most mutexes in mesa only needs lock/unlock, and the idea here is that we can
>>> inline the atomic operation and make the fast case just two intructions.
>>> Mutexes are subtle and finicky to implement, so we carefully copy the
>>> implementation from Ulrich Dreppers well-written and well-reviewed paper:
>>>
>>> "Futexes Are Tricky"
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.akkadia.org_drepper_futex.pdf&d=AwIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzE&m=NS0xLkqIj43l--WADuy3EQa3yVe4rItSr1sBgtCZJ28&s=jUMBbUUMfsjTAo4ye4aoY9kqeuG10NtNEuSLKRsxPoc&e=
>>>
>>> We implement "mutex3", which gives us a mutex that has no syscalls on
>>> uncontended lock or unlock. Further, the uncontended case boils down to a
>>> cmpxchg and an untaken branch and the uncontended unlock is just a locked decr
>>> and an untaken branch. We use __builtin_expect() to indicate that contention
>>> is unlikely so that gcc will put the contention code out of the main code
>>> flow.
I don't oppose the idea of a faster mutex. But do you have some
performance figures with this patch? (It doesn't need to be a real-life
app -- an artificial demo/benchmark would suffice.
What I'd like to know is, is the performance improvement significant
enough to at least justify the complexity of maintaining a multiple
mutex type across our code?
Because I never had the impression that mutexes were a bottleneck.
Atomic reference counting is probably more of an problem.
>>> A fast mutex only supports lock/unlock, can't be recursive or used with
>>> condition variables. We keep the pthread mutex implementation around as
>>> full_mtx_t for the few places where we use condition variables or recursive
>>> locking. For platforms or compilers where futex and atomics aren't available,
>>> mtx_t falls back to the pthread mutex.
>>>
>>> The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
>>> applications. Most CPU bound cases are helped and some of our internal
>>> bind_buffer_object heavy benchmarks gain up to 10%.
>>>
>> Hi Kristian,
>>
>> Can I humbly ask that you split this into two patches - one that
>> introduces the new functions/struct and another one that uses them ?
>> This way it'll be easier if/when things go crazy.
>>
>> Also the patch seems to wonder between posix and win32
>> + typedef full_mtx_t mtx_t;
>>
>> and
>> + typedef mtx_t fast_mtx_t;
>>
>> Looks like a left over from the "should I rename XX variables to fast*
>> or just one to full*" moment :)
>
> Yeah, that's how it progressed :) At first I called it fast_mtx_t and
> planned on replacing simple uses of mtx_t one by one. Jordan suggested
> that it'd be easier to make the regular mutex fast and then rename the
> couple of places that use more feature than we provide.
I'm however strongly against having a non-standard mutex using a
standard name like `mtx_t`.
The point of using C11 names for threading primitives was to enable us
to implement Mesa using standard-looking C code. The idea was that at
one point we'd only use our C11 threads.h emulation where needed.
Please keep in mind that if/when platforms start providing C11 threads.h
we might be forced to use them instead of our own, as system/3rd party
headers might start depending on them on their ABIs.
It is imperative that any non-standard mutexes use names that don't
collide with C11 threads names.
Jose
More information about the mesa-dev
mailing list