[PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
Thomas Hellstrom
thellstrom at vmware.com
Thu Jun 14 11:10:07 UTC 2018
On 06/14/2018 12:38 PM, Andrea Parri wrote:
> Hi Thomas,
>
> On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
>> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
>> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
>> is, contrary to Wait-Die a preemptive algorithm and is known to generate
>> fewer backoffs. Testing reveals that this is true if the
>> number of simultaneous contending transactions is small.
>> As the number of simultaneous contending threads increases, Wait-Wound
>> becomes inferior to Wait-Die in terms of elapsed time.
>> Possibly due to the larger number of held locks of sleeping transactions.
>>
>> Update documentation and callers.
>>
>> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
>> tag patch-18-06-14
>>
>> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
>> chosen out of 100000. Four core Intel x86_64:
>>
>> Algorithm #threads Rollbacks time
>> Wound-Wait 4 ~100 ~17s.
>> Wait-Die 4 ~150000 ~19s.
>> Wound-Wait 16 ~360000 ~109s.
>> Wait-Die 16 ~450000 ~82s.
>>
>> Cc: Peter Zijlstra <peterz at infradead.org>
>> Cc: Ingo Molnar <mingo at redhat.com>
>> Cc: Jonathan Corbet <corbet at lwn.net>
>> Cc: Gustavo Padovan <gustavo at padovan.org>
>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Cc: Sean Paul <seanpaul at chromium.org>
>> Cc: David Airlie <airlied at linux.ie>
>> Cc: Davidlohr Bueso <dave at stgolabs.net>
>> Cc: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
>> Cc: Josh Triplett <josh at joshtriplett.org>
>> Cc: Thomas Gleixner <tglx at linutronix.de>
>> Cc: Kate Stewart <kstewart at linuxfoundation.org>
>> Cc: Philippe Ombredanne <pombredanne at nexb.com>
>> Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
>> Cc: linux-doc at vger.kernel.org
>> Cc: linux-media at vger.kernel.org
>> Cc: linaro-mm-sig at lists.linaro.org
>> Signed-off-by: Thomas Hellstrom <thellstrom at vmware.com>
>>
>> ---
>> v2:
>> * Update API according to review comment by Greg Kroah-Hartman.
>> * Address review comments by Peter Zijlstra:
>> - Avoid _Bool in composites
>> - Fix typo
>> - Use __mutex_owner() where applicable
>> - Rely on built-in barriers for the main loop exit condition,
>> struct ww_acquire_ctx::wounded. Update code comments.
>> - Explain unlocked use of list_empty().
>> ---
>> Documentation/locking/ww-mutex-design.txt | 54 ++++++++++++----
>> drivers/dma-buf/reservation.c | 2 +-
>> drivers/gpu/drm/drm_modeset_lock.c | 2 +-
>> include/linux/ww_mutex.h | 19 ++++--
>> kernel/locking/locktorture.c | 2 +-
>> kernel/locking/mutex.c | 103 +++++++++++++++++++++++++++---
>> kernel/locking/test-ww_mutex.c | 2 +-
>> lib/locking-selftest.c | 2 +-
>> 8 files changed, 156 insertions(+), 30 deletions(-)
>>
>> diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
>> index 34c3a1b50b9a..b9597def9581 100644
>> --- a/Documentation/locking/ww-mutex-design.txt
>> +++ b/Documentation/locking/ww-mutex-design.txt
>> @@ -1,4 +1,4 @@
>> -Wait/Wound Deadlock-Proof Mutex Design
>> +Wound/Wait Deadlock-Proof Mutex Design
>> ======================================
>>
>> Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
>> @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
>> younger task) unlocks all of the buffers that it has already locked, and then
>> tries again.
>>
>> -In the RDBMS literature this deadlock handling approach is called wait/wound:
>> -The older tasks waits until it can acquire the contended lock. The younger tasks
>> -needs to back off and drop all the locks it is currently holding, i.e. the
>> -younger task is wounded.
>> +In the RDBMS literature, a reservation ticket is associated with a transaction.
>> +and the deadlock handling approach is called Wait-Die. The name is based on
>> +the actions of a locking thread when it encounters an already locked mutex.
>> +If the transaction holding the lock is younger, the locking transaction waits.
>> +If the transaction holding the lock is older, the locking transaction backs off
>> +and dies. Hence Wait-Die.
>> +There is also another algorithm called Wound-Wait:
>> +If the transaction holding the lock is younger, the locking transaction
>> +preempts the transaction holding the lock, requiring it to back off. It
>> +Wounds the other transaction.
>> +If the transaction holding the lock is older, it waits for the other
>> +transaction. Hence Wound-Wait.
>> +The two algorithms are both fair in that a transaction will eventually succeed.
>> +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
>> +compared to Wait-Die, but is, on the other hand, associated with more work than
>> +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
>> +algorithm which requires a reliable way to preempt another transaction.
>>
>> Concepts
>> --------
>> @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
>> trying to acquire locks doesn't grab a new reservation id, but keeps the one it
>> acquired when starting the lock acquisition. This ticket is stored in the
>> acquire context. Furthermore the acquire context keeps track of debugging state
>> -to catch w/w mutex interface abuse.
>> +to catch w/w mutex interface abuse. An acquire context is representing a
>> +transaction.
>>
>> W/w class: In contrast to normal mutexes the lock class needs to be explicit for
>> -w/w mutexes, since it is required to initialize the acquire context.
>> +w/w mutexes, since it is required to initialize the acquire context. The lock
>> +class also specifies what algorithm to use, Wound-Wait or Wait-Die.
>>
>> Furthermore there are three different class of w/w lock acquire functions:
>>
>> @@ -90,6 +105,12 @@ provided.
>> Usage
>> -----
>>
>> +The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
>> +DEFINE_WW_CLASS_WDIE() for Wait-Die or DEFINE_WW_CLASS() for Wound-Wait.
>> +As a rough rule of thumb, use Wound-Wait iff you typically expect the number
>> +of simultaneous competing transactions to be small, and the rollback cost can
>> +be substantial.
>> +
>> Three different ways to acquire locks within the same w/w class. Common
>> definitions for methods #1 and #2:
>>
>> @@ -312,12 +333,23 @@ Design:
>> We maintain the following invariants for the wait list:
>> (1) Waiters with an acquire context are sorted by stamp order; waiters
>> without an acquire context are interspersed in FIFO order.
>> - (2) Among waiters with contexts, only the first one can have other locks
>> - acquired already (ctx->acquired > 0). Note that this waiter may come
>> - after other waiters without contexts in the list.
>> + (2) For Wait-Die, among waiters with contexts, only the first one can have
>> + other locks acquired already (ctx->acquired > 0). Note that this waiter
>> + may come after other waiters without contexts in the list.
>> +
>> + The Wound-Wait preemption is implemented with a lazy-preemption scheme:
>> + The wounded status of the transaction is checked only when there is
>> + contention for a new lock and hence a true chance of deadlock. In that
>> + situation, if the transaction is wounded, it backs off, clears the
>> + wounded status and retries. A great benefit of implementing preemption in
>> + this way is that the wounded transaction can identify a contending lock to
>> + wait for before restarting the transaction. Just blindly restarting the
>> + transaction would likely make the transaction end up in a situation where
>> + it would have to back off again.
>>
>> In general, not much contention is expected. The locks are typically used to
>> - serialize access to resources for devices.
>> + serialize access to resources for devices, and optimization focus should
>> + therefore be directed towards the uncontended cases.
>>
>> Lockdep:
>> Special care has been taken to warn for as many cases of api abuse
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..b94a4bab2ecd 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -46,7 +46,7 @@
>> * write-side updates.
>> */
>>
>> -DEFINE_WW_CLASS(reservation_ww_class);
>> +DEFINE_WW_CLASS_WDIE(reservation_ww_class);
>> EXPORT_SYMBOL(reservation_ww_class);
>>
>> struct lock_class_key reservation_seqcount_class;
>> diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
>> index 8a5100685875..ff00a814f617 100644
>> --- a/drivers/gpu/drm/drm_modeset_lock.c
>> +++ b/drivers/gpu/drm/drm_modeset_lock.c
>> @@ -70,7 +70,7 @@
>> * lists and lookup data structures.
>> */
>>
>> -static DEFINE_WW_CLASS(crtc_ww_class);
>> +static DEFINE_WW_CLASS_WDIE(crtc_ww_class);
>>
>> /**
>> * drm_modeset_lock_all - take all modeset locks
>> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
>> index 39fda195bf78..3880813b7db5 100644
>> --- a/include/linux/ww_mutex.h
>> +++ b/include/linux/ww_mutex.h
>> @@ -8,6 +8,8 @@
>> *
>> * Wound/wait implementation:
>> * Copyright (C) 2013 Canonical Ltd.
>> + * Choice of algorithm:
>> + * Copyright (C) 2018 WMWare Inc.
>> *
>> * This file contains the main data structure and API definitions.
>> */
>> @@ -23,15 +25,17 @@ struct ww_class {
>> struct lock_class_key mutex_key;
>> const char *acquire_name;
>> const char *mutex_name;
>> + unsigned int is_wait_die;
>> };
>>
>> struct ww_acquire_ctx {
>> struct task_struct *task;
>> unsigned long stamp;
>> unsigned acquired;
>> + unsigned int wounded;
>> + struct ww_class *ww_class;
>> #ifdef CONFIG_DEBUG_MUTEXES
>> unsigned done_acquire;
>> - struct ww_class *ww_class;
>> struct ww_mutex *contending_lock;
>> #endif
>> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> @@ -58,17 +62,21 @@ struct ww_mutex {
>> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
>> #endif
>>
>> -#define __WW_CLASS_INITIALIZER(ww_class) \
>> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
>> { .stamp = ATOMIC_LONG_INIT(0) \
>> , .acquire_name = #ww_class "_acquire" \
>> - , .mutex_name = #ww_class "_mutex" }
>> + , .mutex_name = #ww_class "_mutex" \
>> + , .is_wait_die = _is_wait_die }
>>
>> #define __WW_MUTEX_INITIALIZER(lockname, class) \
>> { .base = __MUTEX_INITIALIZER(lockname.base) \
>> __WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>>
>> #define DEFINE_WW_CLASS(classname) \
>> - struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
>> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
>> +
>> +#define DEFINE_WW_CLASS_WDIE(classname) \
>> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
>>
>> #define DEFINE_WW_MUTEX(mutexname, ww_class) \
>> struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
>> @@ -123,8 +131,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>> ctx->task = current;
>> ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
>> ctx->acquired = 0;
>> -#ifdef CONFIG_DEBUG_MUTEXES
>> ctx->ww_class = ww_class;
>> + ctx->wounded = false;
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> ctx->done_acquire = 0;
>> ctx->contending_lock = NULL;
>> #endif
>> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
>> index 6850ffd69125..e861c1bf0e1e 100644
>> --- a/kernel/locking/locktorture.c
>> +++ b/kernel/locking/locktorture.c
>> @@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
>> };
>>
>> #include <linux/ww_mutex.h>
>> -static DEFINE_WW_CLASS(torture_ww_class);
>> +static DEFINE_WW_CLASS_WDIE(torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 2048359f33d2..ffa00b5aaf03 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -290,12 +290,49 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>> (a->stamp != b->stamp || a > b);
>> }
>>
>> +/*
>> + * Wound the lock holder transaction if it's younger than the contending
>> + * transaction, and there is a possibility of a deadlock.
>> + * Also if the lock holder transaction isn't the current transaction,
>> + * make sure it's woken up in case it's sleeping on another ww mutex.
>> + */
>> +static bool __ww_mutex_wound(struct mutex *lock,
>> + struct ww_acquire_ctx *ww_ctx,
>> + struct ww_acquire_ctx *hold_ctx)
>> +{
>> + struct task_struct *owner = __mutex_owner(lock);
>> +
>> + lockdep_assert_held(&lock->wait_lock);
>> +
>> + if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> + ww_ctx->acquired > 0) {
>> + hold_ctx->wounded = 1;
>> +
>> + /*
>> + * wake_up_process() paired with set_current_state() inserts
>> + * sufficient barriers to make sure @owner either sees it's
>> + * wounded or has a wakeup pending to re-read the wounded
>> + * state.
> IIUC, "sufficient barriers" = full memory barriers (here). (You may
> want to be more specific.)
Thanks for reviewing!
OK. What about if someone relaxes that in the future? I mean, what we
care about in this code is just that we have sufficient barriers for
that statement to be true, regardless what type of barriers those really
are?
>
>> + *
>> + * The value of hold_ctx->wounded in
>> + * __ww_mutex_lock_check_stamp();
> Missing parts/incomplete sentence?
Oops. I'll fix in next version.
>
> Andrea
Thanks,
Thomas
More information about the dri-devel
mailing list