[PATCH 1/3] dma-buf: add dma_resv_ctx for deadlock handling

Christian König ckoenig.leichtzumerken at gmail.com
Wed Oct 9 13:21:06 UTC 2019


Am 08.10.19 um 17:16 schrieb Daniel Vetter:
> On Wed, Sep 18, 2019 at 07:55:23PM +0200, Christian König wrote:
>> The ww_mutex framework allows for detecting deadlocks when multiple
>> threads try to acquire the same set of locks in different order.
>>
>> The problem is that handling those deadlocks was the burden of the user of
>> the ww_mutex implementation and at least some users didn't got that right
>> on the first try.
>>
>> So introduce a new dma_resv_ctx object which can be used to
>> simplify the deadlock handling. This is done by tracking all locked
>> dma_resv objects in the context as well as the last contended object.
>>
>> When a deadlock occurse we now unlock all previously locked objects and
>> acquire the contended lock in the slow path. After this is done -EDEADLK
>> is still returned to signal that all other locks now need to be
>> re-acquired again.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
> I pondered this quite a bit, some thoughts:
>
> - doing this over dma-buf means we can only use this for the ww_mutx
>    dance, not for anything else. Which means we need another layer on top
>    for shared execbuf utils (which Gerd has started looking into, but I
>    felt like not quite the right approach yet in his first draft series).

Yes, and I actually realized that this won't work on the dma_resv layer 
as well.

We need to base this on GEM to be able to do the correct ref/unref the 
locked objects.

> - With the ttm/gem merger we could just put this into drm_gem_object, and
>    ttm/gem helpers could still use it. Plus we could build some shared
>    execbuf utils on top of this, by essentially merging ttm_operation_ctx
>    into drm_gem_operation_ctx. I think this would also simplify the ttm
>    parameters a bit, since currently the ttm_operation_ctx doesn't have an
>    explicit pointer to the ww_acquire_ctx (which feels like an oversight to
>    me).

That ttm_operation_ctx doesn't contain a ww_acquire_ctx is intentional 
and mandatory for correct operation.

See for swapping things out from the MM callbacks you can't work with a 
ww_acquire_ctx and instead need to use trylock.

When you need a ww_acquire_ctx you can get that from the buffer object 
you try to allocate space for.

> - Aside, quick question: Any reason why struct amdgpu_cs_parser doesn't
>    subclass ttm_operation_ctx? From my naive understanding this would make
>    tons of sense ...

amdgpu_cs_parser is already overloaded with to much crap which actually 
should be temporary allocated on the stack.

> - Maybe we could even build some lru/eviction helpers on top, perhaps with
>    two lists, one for the set of buffers in the execbuf, the other for
>    additional buffers we've reserved as part of the eviction dance (to
>    solve the TODO in ttm_mem_evict_wait_busy).

That's what I'm currently working on, but some driver still need the 
struct_mutex for GEM ref/unref which is complicating things a bit.

So prototyping that in TTM BOs first before I move on to using the 
underlying GEM object.

> - Only downside would be that drivers outside of drivers/gpu won't be able
>    to use these helpers. I don't see any immediate nor near-future need for
>    that. All other drivers use so little memory they don't need to
>    participate in the multi-lock dance, they just pin the few buffers they
>    need and call it a day.

I can live with that.

Regards,
Christian.

>
> Ofc not everything above would need to be done right away, that's more
> ideas for todo.rst entries to make sure we all agree on the rough
> direction.
>
> Thoughts?
>
> Also adding Gerd Hoffmann, since he looked into this.
>
> Cheers, Daniel
>> ---
>>   drivers/dma-buf/Makefile       |   2 +-
>>   drivers/dma-buf/dma-resv-ctx.c | 108 +++++++++++++++++++++++++++++++++
>>   include/linux/dma-resv-ctx.h   |  68 +++++++++++++++++++++
>>   include/linux/dma-resv.h       |   1 +
>>   4 files changed, 178 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/dma-buf/dma-resv-ctx.c
>>   create mode 100644 include/linux/dma-resv-ctx.h
>>
>> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
>> index 03479da06422..da7701c85de2 100644
>> --- a/drivers/dma-buf/Makefile
>> +++ b/drivers/dma-buf/Makefile
>> @@ -1,6 +1,6 @@
>>   # SPDX-License-Identifier: GPL-2.0-only
>>   obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
>> -	 dma-resv.o seqno-fence.o
>> +	 dma-resv.o dma-resv-ctx.o seqno-fence.o
>>   obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
>>   obj-$(CONFIG_SW_SYNC)		+= sw_sync.o sync_debug.o
>>   obj-$(CONFIG_UDMABUF)		+= udmabuf.o
>> diff --git a/drivers/dma-buf/dma-resv-ctx.c b/drivers/dma-buf/dma-resv-ctx.c
>> new file mode 100644
>> index 000000000000..cad10fa6f80b
>> --- /dev/null
>> +++ b/drivers/dma-buf/dma-resv-ctx.c
>> @@ -0,0 +1,108 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright 2019 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *	Christian König <christian.koenig at amd.com>
>> + */
>> +
>> +#include <linux/dma-resv-ctx.h>
>> +
>> +/**
>> + * dma_resv_ctx_init - initialize a reservation context
>> + * @ctx: the context to initialize
>> + *
>> + * Start using this reservation context to lock reservation objects for update.
>> + */
>> +void dma_resv_ctx_init(struct dma_resv_ctx *ctx)
>> +{
>> +	ww_acquire_init(&ctx->base, &reservation_ww_class);
>> +	init_llist_head(&ctx->locked);
>> +	ctx->contended = NULL;
>> +}
>> +EXPORT_SYMBOL(dma_resv_ctx_init);
>> +
>> +/**
>> + * dma_resv_ctx_unlock_all - unlock all reservation objects
>> + * @ctx: the context which holds the reservation objects
>> + *
>> + * Unlocks all reservation objects locked with this context.
>> + */
>> +void dma_resv_ctx_unlock_all(struct dma_resv_ctx *ctx)
>> +{
>> +	struct dma_resv *obj, *next;
>> +
>> +	if (ctx->contended)
>> +		dma_resv_unlock(ctx->contended);
>> +	ctx->contended = NULL;
>> +
>> +	llist_for_each_entry_safe(obj, next, ctx->locked.first, locked)
>> +		ww_mutex_unlock(&obj->lock);
>> +	init_llist_head(&ctx->locked);
>> +}
>> +EXPORT_SYMBOL(dma_resv_ctx_unlock_all);
>> +
>> +/**
>> + * dma_resv_ctx_lock - lock a reservation object with deadlock handling
>> + * @ctx: the context which should be used to lock the object
>> + * @obj: the object which needs to be locked
>> + * @interruptible: if we should wait interruptible
>> + *
>> + * Use @ctx to lock the reservation object. If a deadlock is detected we backoff
>> + * by releasing all locked objects and use the slow path to lock the reservation
>> + * object. After successfully locking in the slow path -EDEADLK is returned to
>> + * signal that all other locks must be re-taken as well.
>> + */
>> +int dma_resv_ctx_lock(struct dma_resv_ctx *ctx, struct dma_resv *obj,
>> +		      bool interruptible)
>> +{
>> +	int ret = 0;
>> +
>> +	if (unlikely(ctx->contended == obj))
>> +		ctx->contended = NULL;
>> +	else if (interruptible)
>> +		ret = dma_resv_lock_interruptible(obj, &ctx->base);
>> +	else
>> +		ret = dma_resv_lock(obj, &ctx->base);
>> +
>> +	if (likely(!ret)) {
>> +		/* don't use llist_add here, we have separate locking */
>> +		obj->locked.next = ctx->locked.first;
>> +		ctx->locked.first = &obj->locked;
>> +		return 0;
>> +	}
>> +	if (unlikely(ret != -EDEADLK))
>> +		return ret;
>> +
>> +	dma_resv_ctx_unlock_all(ctx);
>> +
>> +	if (interruptible) {
>> +		ret = dma_resv_lock_slow_interruptible(obj, &ctx->base);
>> +		if (unlikely(ret))
>> +			return ret;
>> +	} else {
>> +		dma_resv_lock_slow(obj, &ctx->base);
>> +	}
>> +
>> +	ctx->contended = obj;
>> +	return -EDEADLK;
>> +}
>> +EXPORT_SYMBOL(dma_resv_ctx_lock);
>> diff --git a/include/linux/dma-resv-ctx.h b/include/linux/dma-resv-ctx.h
>> new file mode 100644
>> index 000000000000..86473de65167
>> --- /dev/null
>> +++ b/include/linux/dma-resv-ctx.h
>> @@ -0,0 +1,68 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright 2019 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *	Christian König <christian.koenig at amd.com>
>> + */
>> +
>> +#ifndef _LINUX_DMA_RESV_CTX_H
>> +#define _LINUX_DMA_RESV_CTX_H
>> +
>> +#include <linux/llist.h>
>> +#include <linux/dma-resv.h>
>> +
>> +/**
>> + * struct dma_resv_ctx - context to lock reservation objects
>> + * @base: ww_acquire_ctx used for deadlock detection
>> + * @locked: list of dma_resv objects locked in this context
>> + * @contended: contended dma_resv object
>> + */
>> +struct dma_resv_ctx {
>> +	struct ww_acquire_ctx base;
>> +	struct llist_head locked;
>> +	struct dma_resv *contended;
>> +};
>> +
>> +/**
>> + * dma_resv_ctx_done - wrapper for ww_acquire_done
>> + * @ctx: the reservation context which is done with locking
>> + */
>> +static inline void dma_resv_ctx_done(struct dma_resv_ctx *ctx)
>> +{
>> +	ww_acquire_done(&ctx->base);
>> +}
>> +
>> +/**
>> + * dma_resv_ctx_fini - wrapper for ww_acquire_fini
>> + * @ctx: the reservation context which is finished
>> + */
>> +static inline void dma_resv_ctx_fini(struct dma_resv_ctx *ctx)
>> +{
>> +	ww_acquire_fini(&ctx->base);
>> +}
>> +
>> +void dma_resv_ctx_init(struct dma_resv_ctx *ctx);
>> +void dma_resv_ctx_unlock_all(struct dma_resv_ctx *ctx);
>> +int dma_resv_ctx_lock(struct dma_resv_ctx *ctx, struct dma_resv *obj,
>> +		      bool interruptible);
>> +
>> +#endif /* _LINUX_DMA_RESV_CTX_H */
>> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
>> index ee50d10f052b..1267822c2669 100644
>> --- a/include/linux/dma-resv.h
>> +++ b/include/linux/dma-resv.h
>> @@ -71,6 +71,7 @@ struct dma_resv_list {
>>    */
>>   struct dma_resv {
>>   	struct ww_mutex lock;
>> +	struct llist_node locked;
>>   	seqcount_t seq;
>>   
>>   	struct dma_fence __rcu *fence_excl;
>> -- 
>> 2.17.1
>>



More information about the dri-devel mailing list