[PATCH 01/13] drm: execution context for GEM buffers v4
Thomas Hellström (Intel)
thomas_os at shipmail.org
Mon Jun 19 09:33:11 UTC 2023
Hi!
On 6/19/23 11:20, Christian König wrote:
> Hi guys,
>
> Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>>>
>>>>> I really need to find some time to work on that anyway.
>>> I've been playing with drm_exec for a couple weeks now, and I wanted
>>> to share something I hacked to try and make the API simpler and
>>> more robust against misuse (see the below diff, which is a slightly
>>> adjusted version of your work).
>>
>> It would be good if we could have someone taking charge of this
>> series and address all review comments, I see some of my comments
>> getting lost, we have multiple submitters and I can't find a
>> dri-devel patchwork entry for this. Anyway some comments below.
>
> I can try to find some time for the series this week (As long as
> nobody comes along and has any burning roof).
>
>>
>>>
>>> In this version, the user is no longer in control of the retry
>>> loop. Instead, it provides an expression (a call to a
>>> sub-function) to be re-evaluated each time a contention is
>>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>>> apprehend, and avoids any mistake like calling
>>> drm_exec_continue_on_contention() in an inner loop, or breaking
>>> out of the drm_exec_while_all_locked() loop unintentionally.
>>
>> In i915 we've had a very similar helper to this, and while I agree
>> this newer version would probably help make code cleaner, but OTOH
>> there also are some places where the short
>> drm_exec_while_all_locked() -likeblock don't really motivate a
>> separate function. Porting i915 to the current version will take some
>> work, For the xe driver both versions would work fine.
>
> Yeah, this is actually what my first version of this looked like. But
> I abandoned that approach because we have a lot of cases were we just
> quickly want to lock a few GEM objects and don't want the extra
> overhead of putting all the state into some bag to forward it to a
> function.
>
>>
>> Some additional review comments not related to the interface change
>> below:
>>
>>>
>>> It also makes the internal management a bit simpler, since we
>>> no longer call drm_exec_cleanup() on the first attempt, and can
>>> thus get rid of the DRM_EXEC_DUMMY trick.
>>>
>>> In the below diff, I also re-introduced native support for
>>> duplicates as an opt-in, so we don't have to do things like:
>>>
>>> ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>> if (ret == -EALREADY)
>>> ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>> if (ret)
>>> return ret;
>>>
>>> and can just do:
>>>
>>> ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>> if (ret)
>>> return;
>>>
>>> Of course drivers can open-code a wrapper doing the same thing, but
>>> given at least pvr and radeon need this, it'd be nice if the core
>>> could support it natively.
>>>
>>> That's mostly it. Just wanted to share what I had in case you're
>>> interested. If not, that's fine too.
>>>
>>> Regards,
>>>
>>> Boris
>>> ---
>>> Documentation/gpu/drm-mm.rst | 12 ++
>>> drivers/gpu/drm/Kconfig | 6 +
>>> drivers/gpu/drm/Makefile | 2 +
>>> drivers/gpu/drm/drm_exec.c | 274
>>> +++++++++++++++++++++++++++++++++++
>>> include/drm/drm_exec.h | 130 +++++++++++++++++
>>> 5 files changed, 424 insertions(+)
>>> create mode 100644 drivers/gpu/drm/drm_exec.c
>>> create mode 100644 include/drm/drm_exec.h
>>>
>>> diff --git a/Documentation/gpu/drm-mm.rst
>>> b/Documentation/gpu/drm-mm.rst
>>> index fe40ee686f6e..c9f120cfe730 100644
>>> --- a/Documentation/gpu/drm-mm.rst
>>> +++ b/Documentation/gpu/drm-mm.rst
>>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>> .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>> :export:
>>> +DRM Execution context
>>> +=====================
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> + :doc: Overview
>>> +
>>> +.. kernel-doc:: include/drm/drm_exec.h
>>> + :internal:
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> + :export:
>>> +
>>> GPU Scheduler
>>> =============
>>> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>>> index 76991720637c..01a38fcdb1c4 100644
>>> --- a/drivers/gpu/drm/Kconfig
>>> +++ b/drivers/gpu/drm/Kconfig
>>> @@ -194,6 +194,12 @@ config DRM_TTM
>>> GPU memory types. Will be enabled automatically if a device
>>> driver
>>> uses it.
>>> +config DRM_EXEC
>>> + tristate
>>> + depends on DRM
>>> + help
>>> + Execution context for command submissions
>>> +
>>> config DRM_BUDDY
>>> tristate
>>> depends on DRM
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 1873f64db171..18a02eaf2d49 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) +=
>>> drm_panel_orientation_quirks.o
>>> #
>>> # Memory-management helpers
>>> #
>>> +#
>>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>> obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>>> new file mode 100644
>>> index 000000000000..e0ad1a3e1610
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/drm_exec.c
>>> @@ -0,0 +1,274 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#include <drm/drm_exec.h>
>>> +#include <drm/drm_gem.h>
>>> +#include <linux/dma-resv.h>
>>> +
>>> +/**
>>> + * DOC: Overview
>>> + *
>>> + * This component mainly abstracts the retry loop necessary for
>>> locking
>>> + * multiple GEM objects while preparing hardware operations (e.g.
>>> command
>>> + * submissions, page table updates etc..).
>>> + *
>>> + * If a contention is detected while locking a GEM object the
>>> cleanup procedure
>>> + * unlocks all previously locked GEM objects and locks the
>>> contended one first
>>> + * before locking any further objects.
>>> + *
>>> + * After an object is locked fences slots can optionally be
>>> reserved on the
>>> + * dma_resv object inside the GEM object.
>>> + *
>>> + * A typical usage pattern should look like this::
>>> + *
>>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>>> + * {
>>> + * struct drm_gem_object *boA, *boB;
>>> + * int ret;
>>> + *
>>> + * <retrieve boA and boB here>
>>> + *
>>> + * ret = drm_exec_prepare_obj(&exec, boA, 1);
>>> + * if (ret)
>>> + * return ret;
>>> + *
>>> + * ret = drm_exec_prepare_obj(&exec, boB, 1);
>>> + * if (ret)
>>> + * return ret;
>>> + *
>>> + * return 0;
>>> + * }
>>> + *
>>> + * int some_func()
>>> + * {
>>> + * struct drm_exec exec;
>>> + * unsigned long index;
>>> + * int ret;
>>> + *
>>> + * drm_exec_init(&exec, true);
>>> + * ret = drm_exec_until_all_locked(&exec,
>>> prepare_objs_func(&exec, ...));
>>> + * if (ret)
>>> + * goto error;
>>> + *
>>> + * drm_exec_for_each_locked_object(&exec, index, obj) {
>>> + * dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>>> + * ...
>>> + * }
>>> + * drm_exec_fini(&exec);
>>> + *
>>> + * See struct dma_exec for more details.
>>> + */
>>> +
>>> +/* Unlock all objects and drop references */
>>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>>> +{
>>> + struct drm_gem_object *obj;
>>> + unsigned long index;
>>> +
>>> + drm_exec_for_each_locked_object(exec, index, obj) {
>>> + dma_resv_unlock(obj->resv);
>>> + drm_gem_object_put(obj);
>>> + }
>>> +
>>> + drm_gem_object_put(exec->prelocked);
>>> + exec->prelocked = NULL;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_init - initialize a drm_exec object
>>> + * @exec: the drm_exec object to initialize
>>> + * @interruptible: if locks should be acquired interruptible
>>> + *
>>> + * Initialize the object and make sure that we can track locked
>>> objects.
>>> + */
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>>> +{
>>> + exec->flags = flags;
>>> + exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>> +
>>> + /* If allocation here fails, just delay that till the first use */
>>> + exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *)
>>> : 0;
>>> + exec->num_objects = 0;
>>> + exec->contended = NULL;
>>> + exec->prelocked = NULL;
>>> + ww_acquire_init(&exec->ticket, &reservation_ww_class);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_init);
>>> +
>>> +/**
>>> + * drm_exec_fini - finalize a drm_exec object
>>> + * @exec: the drm_exec object to finalize
>>> + *
>>> + * Unlock all locked objects, drop the references to objects and
>>> free all memory
>>> + * used for tracking the state.
>>> + */
>>> +void drm_exec_fini(struct drm_exec *exec)
>>> +{
>>> + drm_exec_unlock_all(exec);
>>> + kvfree(exec->objects);
>>> + if (exec->contended)
>>> + drm_gem_object_put(exec->contended);
>>> + ww_acquire_fini(&exec->ticket);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_fini);
>>> +
>>> +/**
>>> + * drm_exec_reset - reset a drm_exec object after a contention
>>> + * @exec: the drm_exec object to reset
>>> + *
>>> + * Unlock all locked objects and resets the number of objects locked.
>>> + */
>>> +void drm_exec_reset(struct drm_exec *exec)
>>> +{
>>> + WARN_ON(!exec->contended);
>>> + drm_exec_unlock_all(exec);
>>> + exec->num_objects = 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_reset);
>>> +
>>> +/* Track the locked object in the array */
>>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>>> + struct drm_gem_object *obj)
>>> +{
>>> + if (unlikely(exec->num_objects == exec->max_objects)) {
>>> + size_t size = exec->max_objects * sizeof(void *);
>>> + void *tmp;
>>> +
>>> + tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>>> + GFP_KERNEL);
>>> + if (!tmp)
>>> + return -ENOMEM;
>>
>> Sometimes you need to just temporarily lock an object and then unlock
>> it again if it goes out of scope before reaching the end of
>> _until_all_locked(). In that case you might need to remove a lock
>> from the array. I *think* for all use-cases in i915 it would suffice
>> to take a snapshot of num_objects, and unlock everything above that,
>> having exec->objects behave like a stack, but was ever a list
>> considered instead of a realloced array?
>
> Yes, the problem is that linked lists really suck regarding their
> cache line locality. That's why I've came up with this approach here.
>
> What we could maybe do is to allow unlocking objects, but with the
> cost of linear backward searching for them in the array.
>
>>
>>> +
>>> + exec->objects = tmp;
>>> + exec->max_objects += PAGE_SIZE / sizeof(void *);
>>> + }
>>> + drm_gem_object_get(obj);
>>> + exec->objects[exec->num_objects++] = obj;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/* Make sure the contended object is locked first */
>>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>>> +{
>>> + struct drm_gem_object *obj = exec->contended;
>>> + int ret;
>>> +
>>> + if (likely(!obj))
>>> + return 0;
>>> +
>>> + if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>>> + ret = dma_resv_lock_slow_interruptible(obj->resv,
>>> + &exec->ticket);
>>> + if (unlikely(ret))
>>> + goto error_dropref;
>>> + } else {
>>> + dma_resv_lock_slow(obj->resv, &exec->ticket);
>>> + }
>>> +
>>
>> Sometimes you want to just drop the contended lock after the above
>> relaxation. (Eviction would be one example), and not add as
>> prelocked, if the contended object goes out of scope. Eviction would
>> in some situations be one such example, -EDEADLOCK leading to an
>> error path where the object should otherwise be freed is another.
>> Perhaps we could add an argument to prepare_obj() as to whether the
>> object should be immediately put after relaxation.
>
> I was considering a try_prepare version as well, that should cover
> this use case.
That sounds a bit different from this use-case. The use-case above
would, on -EDEADLOCK actually unlock everything, then lock-slow the
contending lock and then immediately unlock it and drop. It sounds like
try_prepare would just skip locking and continue with everything locked
so far still locked?
>
>>
>>> + ret = drm_exec_obj_locked(exec, obj);
>>> + if (unlikely(ret)) {
>>> + dma_resv_unlock(obj->resv);
>>> + goto error_dropref;
>>> + }
>>> +
>>> + swap(exec->prelocked, obj);
>>> +
>>> +error_dropref:
>>> + /* Always cleanup the contention so that error handling can
>>> kick in */
>>> + drm_gem_object_put(obj);
>>> + exec->contended = NULL;
>>> + return ret;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>> + * @exec: the drm_exec object with the state
>>> + * @obj: the GEM object to prepare
>>> + * @num_fences: how many fences to reserve
>>> + *
>>> + * Prepare a GEM object for use by locking it and reserving fence
>>> slots. All
>>> + * successfully locked objects are put into the locked container.
>>> + *
>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when
>>> object is
>>> + * already locked, -ENOMEM when memory allocation failed and zero
>>> for success.
>>> + */
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct
>>> drm_gem_object *obj,
>>> + unsigned int num_fences)
>>> +{
>>> + int ret;
>>> +
>>> + ret = drm_exec_lock_contended(exec);
>>> + if (unlikely(ret))
>>> + return ret;
>>> +
>>> + if (exec->prelocked == obj) {
>>> + drm_gem_object_put(exec->prelocked);
>>> + exec->prelocked = NULL;
>>> +
>>> + return dma_resv_reserve_fences(obj->resv, num_fences);
>>> + }
>>> +
>>> + if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>> + ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>> + else
>>> + ret = dma_resv_lock(obj->resv, &exec->ticket);
>>> +
>>> + if (unlikely(ret == -EDEADLK)) {
>>> + drm_gem_object_get(obj);
>>> + exec->contended = obj;
>>> + return -EDEADLK;
>>> + }
>>> +
>>> + if (unlikely(ret == -EALREADY &&
>>> + (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>> + goto reserve_fences;
>>> +
>>> + if (unlikely(ret))
>>> + return ret;
>>> +
>>> + ret = drm_exec_obj_locked(exec, obj);
>>> + if (ret)
>>> + goto error_unlock;
>>> +
>>> +reserve_fences:
>>> + /* Keep locked when reserving fences fails */
>>> + return dma_resv_reserve_fences(obj->resv, num_fences);
>>
>> Ugh, what is the use-case for keeping things locked here? How would a
>> caller tell the difference between an error where everything is
>> locked and nothing is locked? IMO, we should unlock on error here. If
>> there indeed is a use-case we should add a separate function for
>> reserving fences for all locked objects, rather than returning
>> sometimes locked on error sometime not.
>
> We return the object locked here because it was to much churn to
> remove it again from the array and we are getting fully cleaned up at
> the end anyway.
OK, so if we add an unlock functionality, we could just have a
consistent locking state on error return?
Thanks,
Thomas
>
> Regards,
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>> +
>>> +error_unlock:
>>> + dma_resv_unlock(obj->resv);
>>> + return ret;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>> +
>>> +/**
>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>> + * @exec: the drm_exec object with the state
>>> + * @objects: array of GEM object to prepare
>>> + * @num_objects: number of GEM objects in the array
>>> + * @num_fences: number of fences to reserve on each GEM object
>>> + *
>>> + * Prepares all GEM objects in an array, handles contention but
>>> aports on first
>>> + * error otherwise. Reserves @num_fences on each GEM object after
>>> locking it.
>>> + *
>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when
>>> memory
>>> + * allocation failed and zero for success.
>>> + */
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> + struct drm_gem_object **objects,
>>> + unsigned int num_objects,
>>> + unsigned int num_fences)
>>> +{
>>> + int ret;
>>> +
>>> + for (unsigned int i = 0; i < num_objects; ++i) {
>>> + ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>> +
>>> +MODULE_DESCRIPTION("DRM execution context");
>>> +MODULE_LICENSE("Dual MIT/GPL");
>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>> new file mode 100644
>>> index 000000000000..b1a5da0509c1
>>> --- /dev/null
>>> +++ b/include/drm/drm_exec.h
>>> @@ -0,0 +1,130 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#ifndef __DRM_EXEC_H__
>>> +#define __DRM_EXEC_H__
>>> +
>>> +#include <linux/ww_mutex.h>
>>> +
>>> +struct drm_gem_object;
>>> +
>>> +/**
>>> + * enum drm_exec_flags - Execution context flags
>>> + */
>>> +enum drm_exec_flags {
>>> + /**
>>> + * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use
>>> interruptible locking
>>> + * functions.
>>> + */
>>> + DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>> +
>>> + /**
>>> + * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow
>>> EALREADY errors.
>>> + */
>>> + DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>> +};
>>> +
>>> +/**
>>> + * struct drm_exec - Execution context
>>> + */
>>> +struct drm_exec {
>>> + /**
>>> + * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>> + */
>>> + u32 flags;
>>> +
>>> + /**
>>> + * @ticket: WW ticket used for acquiring locks
>>> + */
>>> + struct ww_acquire_ctx ticket;
>>> +
>>> + /**
>>> + * @num_objects: number of objects locked
>>> + */
>>> + unsigned int num_objects;
>>> +
>>> + /**
>>> + * @max_objects: maximum objects in array
>>> + */
>>> + unsigned int max_objects;
>>> +
>>> + /**
>>> + * @objects: array of the locked objects
>>> + */
>>> + struct drm_gem_object **objects;
>>> +
>>> + /**
>>> + * @contended: contended GEM object we backed off for
>>> + */
>>> + struct drm_gem_object *contended;
>>> +
>>> + /**
>>> + * @prelocked: already locked GEM object due to contention
>>> + */
>>> + struct drm_gem_object *prelocked;
>>> +};
>>> +
>>> +/**
>>> + * drm_exec_for_each_locked_object - iterate over all the locked
>>> objects
>>> + * @exec: drm_exec object
>>> + * @index: unsigned long index for the iteration
>>> + * @obj: the current GEM object
>>> + *
>>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>>> + */
>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>> + for (index = 0, obj = (exec)->objects[0]; \
>>> + index < (exec)->num_objects; \
>>> + ++index, obj = (exec)->objects[index])
>>> +
>>> +/**
>>> + * drm_exec_until_all_locked - retry objects preparation until all
>>> objects
>>> + * are locked
>>> + * @exec: drm_exec object
>>> + * @expr: expression to be evaluated on each attempt
>>> + *
>>> + * This helper tries to prepare objects and if a deadlock is detected,
>>> + * rollbacks and retries.
>>> + *
>>> + * @expr is typically a function that tries to prepare objects using
>>> + * drm_exec_prepare_obj().
>>> + *
>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>> + *
>>> + * ret = drm_exec_until_all_locked(exec,
>>> + * drm_exec_prepare_array(exec,
>>> + * objs,
>>> + * num_objs,
>>> + * num_fences));
>>> + * if (ret)
>>> + * goto error_path;
>>> + *
>>> + * ...
>>> + *
>>> + * Returns: 0 on success, a negative error code on failure.
>>> + */
>>> +#define drm_exec_until_all_locked(exec, expr) \
>>> + ({ \
>>> + __label__ retry; \
>>> + int __ret; \
>>> +retry: \
>>> + __ret = expr; \
>>> + if ((exec)->contended) { \
>>> + WARN_ON(__ret != -EDEADLK); \
>>> + drm_exec_reset(exec); \
>>> + goto retry; \
>>> + } \
>>> + ww_acquire_done(&(exec)->ticket); \
>>> + __ret; \
>>> + })
>>> +
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>> +void drm_exec_fini(struct drm_exec *exec);
>>> +void drm_exec_reset(struct drm_exec *exec);
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct
>>> drm_gem_object *obj,
>>> + unsigned int num_fences);
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> + struct drm_gem_object **objects,
>>> + unsigned int num_objects,
>>> + unsigned int num_fences);
>>> +
>>> +#endif
More information about the dri-devel
mailing list