[PATCH 01/13] drm: execution context for GEM buffers v4

Christian König christian.koenig at amd.com
Mon Jun 19 09:20:06 UTC 2023


Hi guys,

Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
> [SNIP]
>>>>
>>>> I really need to find some time to work on that anyway.
>> I've been playing with drm_exec for a couple weeks now, and I wanted
>> to share something I hacked to try and make the API simpler and
>> more robust against misuse (see the below diff, which is a slightly
>> adjusted version of your work).
>
> It would be good if we could have someone taking charge of this series 
> and address all review comments, I see some of my comments getting 
> lost, we have multiple submitters and I can't find a dri-devel 
> patchwork entry for this. Anyway some comments below.

I can try to find some time for the series this week (As long as nobody 
comes along and has any burning roof).

>
>>
>> In this version, the user is no longer in control of the retry
>> loop. Instead, it provides an expression (a call to a
>> sub-function) to be re-evaluated each time a contention is
>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>> apprehend, and avoids any mistake like calling
>> drm_exec_continue_on_contention() in an inner loop, or breaking
>> out of the drm_exec_while_all_locked() loop unintentionally.
>
> In i915 we've had a very similar helper to this, and while I agree 
> this newer version would probably help make code cleaner, but OTOH 
> there also are some places where the short drm_exec_while_all_locked() 
> -likeblock don't really motivate a separate function. Porting i915 to 
> the current version will take some work, For  the xe driver both 
> versions would work fine.

Yeah, this is actually what my first version of this looked like. But I 
abandoned that approach because we have a lot of cases were we just 
quickly want to lock a few GEM objects and don't want the extra overhead 
of putting all the state into some bag to forward it to a function.

>
> Some additional review comments not related to the interface change 
> below:
>
>>
>> It also makes the internal management a bit simpler, since we
>> no longer call drm_exec_cleanup() on the first attempt, and can
>> thus get rid of the DRM_EXEC_DUMMY trick.
>>
>> In the below diff, I also re-introduced native support for
>> duplicates as an opt-in, so we don't have to do things like:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret == -EALREADY)
>>         ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>     if (ret)
>>         return ret;
>>
>> and can just do:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret)
>>         return;
>>
>> Of course drivers can open-code a wrapper doing the same thing, but
>> given at least pvr and radeon need this, it'd be nice if the core
>> could support it natively.
>>
>> That's mostly it. Just wanted to share what I had in case you're
>> interested. If not, that's fine too.
>>
>> Regards,
>>
>> Boris
>> ---
>>   Documentation/gpu/drm-mm.rst |  12 ++
>>   drivers/gpu/drm/Kconfig      |   6 +
>>   drivers/gpu/drm/Makefile     |   2 +
>>   drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
>>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>>   5 files changed, 424 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 include/drm/drm_exec.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index fe40ee686f6e..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>      :export:
>>   +DRM Execution context
>> +=====================
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :doc: Overview
>> +
>> +.. kernel-doc:: include/drm/drm_exec.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :export:
>> +
>>   GPU Scheduler
>>   =============
>>   diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index 76991720637c..01a38fcdb1c4 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -194,6 +194,12 @@ config DRM_TTM
>>         GPU memory types. Will be enabled automatically if a device 
>> driver
>>         uses it.
>>   +config DRM_EXEC
>> +    tristate
>> +    depends on DRM
>> +    help
>> +      Execution context for command submissions
>> +
>>   config DRM_BUDDY
>>       tristate
>>       depends on DRM
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 1873f64db171..18a02eaf2d49 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>> drm_panel_orientation_quirks.o
>>   #
>>   # Memory-management helpers
>>   #
>> +#
>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>     obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>   diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>> new file mode 100644
>> index 000000000000..e0ad1a3e1610
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_exec.c
>> @@ -0,0 +1,274 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/drm_gem.h>
>> +#include <linux/dma-resv.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. 
>> command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the 
>> cleanup procedure
>> + * unlocks all previously locked GEM objects and locks the contended 
>> one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved 
>> on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>> + * {
>> + *    struct drm_gem_object *boA, *boB;
>> + *     int ret;
>> + *
>> + *    <retrieve boA and boB here>
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boB, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *     return 0;
>> + * }
>> + *
>> + * int some_func()
>> + * {
>> + *    struct drm_exec exec;
>> + *    unsigned long index;
>> + *    int ret;
>> + *
>> + *    drm_exec_init(&exec, true);
>> + *    ret = drm_exec_until_all_locked(&exec, 
>> prepare_objs_func(&exec, ...));
>> + *    if (ret)
>> + *        goto error;
>> + *
>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *        ...
>> + *    }
>> + *    drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
>> +
>> +/* Unlock all objects and drop references */
>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        dma_resv_unlock(obj->resv);
>> +        drm_gem_object_put(obj);
>> +    }
>> +
>> +    drm_gem_object_put(exec->prelocked);
>> +    exec->prelocked = NULL;
>> +}
>> +
>> +/**
>> + * drm_exec_init - initialize a drm_exec object
>> + * @exec: the drm_exec object to initialize
>> + * @interruptible: if locks should be acquired interruptible
>> + *
>> + * Initialize the object and make sure that we can track locked 
>> objects.
>> + */
>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>> +{
>> +    exec->flags = flags;
>> +    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> +    /* If allocation here fails, just delay that till the first use */
>> +    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
>> +    exec->num_objects = 0;
>> +    exec->contended = NULL;
>> +    exec->prelocked = NULL;
>> +    ww_acquire_init(&exec->ticket, &reservation_ww_class);
>> +}
>> +EXPORT_SYMBOL(drm_exec_init);
>> +
>> +/**
>> + * drm_exec_fini - finalize a drm_exec object
>> + * @exec: the drm_exec object to finalize
>> + *
>> + * Unlock all locked objects, drop the references to objects and 
>> free all memory
>> + * used for tracking the state.
>> + */
>> +void drm_exec_fini(struct drm_exec *exec)
>> +{
>> +    drm_exec_unlock_all(exec);
>> +    kvfree(exec->objects);
>> +    if (exec->contended)
>> +        drm_gem_object_put(exec->contended);
>> +    ww_acquire_fini(&exec->ticket);
>> +}
>> +EXPORT_SYMBOL(drm_exec_fini);
>> +
>> +/**
>> + * drm_exec_reset - reset a drm_exec object after a contention
>> + * @exec: the drm_exec object to reset
>> + *
>> + * Unlock all locked objects and resets the number of objects locked.
>> + */
>> +void drm_exec_reset(struct drm_exec *exec)
>> +{
>> +    WARN_ON(!exec->contended);
>> +    drm_exec_unlock_all(exec);
>> +    exec->num_objects = 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_reset);
>> +
>> +/* Track the locked object in the array */
>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>> +                   struct drm_gem_object *obj)
>> +{
>> +    if (unlikely(exec->num_objects == exec->max_objects)) {
>> +        size_t size = exec->max_objects * sizeof(void *);
>> +        void *tmp;
>> +
>> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>> +                GFP_KERNEL);
>> +        if (!tmp)
>> +            return -ENOMEM;
>
> Sometimes you need to just temporarily lock an object and then unlock 
> it again if it goes out of scope before reaching the end of 
> _until_all_locked(). In that case you might need to remove a lock from 
> the array. I *think* for all use-cases in i915 it would suffice to 
> take a snapshot of num_objects, and unlock everything above that, 
> having exec->objects behave like a stack, but was ever a list 
> considered instead of a realloced array?

Yes, the problem is that linked lists really suck regarding their cache 
line locality. That's why I've came up with this approach here.

What we could maybe do is to allow unlocking objects, but with the cost 
of linear backward searching for them in the array.

>
>> +
>> +        exec->objects = tmp;
>> +        exec->max_objects += PAGE_SIZE / sizeof(void *);
>> +    }
>> +    drm_gem_object_get(obj);
>> +    exec->objects[exec->num_objects++] = obj;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Make sure the contended object is locked first */
>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj = exec->contended;
>> +    int ret;
>> +
>> +    if (likely(!obj))
>> +        return 0;
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>> +                               &exec->ticket);
>> +        if (unlikely(ret))
>> +            goto error_dropref;
>> +    } else {
>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>> +    }
>> +
>
> Sometimes you want to just drop the contended lock after the above 
> relaxation. (Eviction would be one example), and not add as prelocked, 
> if the contended object goes out of scope. Eviction would in some 
> situations be one such example, -EDEADLOCK leading to an error path 
> where the object should otherwise be freed is another. Perhaps we 
> could add an argument to prepare_obj() as to whether the object should 
> be immediately put after relaxation.

I was considering a try_prepare version as well, that should cover this 
use case.

>
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (unlikely(ret)) {
>> +        dma_resv_unlock(obj->resv);
>> +        goto error_dropref;
>> +    }
>> +
>> +    swap(exec->prelocked, obj);
>> +
>> +error_dropref:
>> +    /* Always cleanup the contention so that error handling can kick 
>> in */
>> +    drm_gem_object_put(obj);
>> +    exec->contended = NULL;
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_exec_prepare_obj - prepare a GEM object for use
>> + * @exec: the drm_exec object with the state
>> + * @obj: the GEM object to prepare
>> + * @num_fences: how many fences to reserve
>> + *
>> + * Prepare a GEM object for use by locking it and reserving fence 
>> slots. All
>> + * successfully locked objects are put into the locked container.
>> + *
>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>> object is
>> + * already locked, -ENOMEM when memory allocation failed and zero 
>> for success.
>> + */
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    ret = drm_exec_lock_contended(exec);
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    if (exec->prelocked == obj) {
>> +        drm_gem_object_put(exec->prelocked);
>> +        exec->prelocked = NULL;
>> +
>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>> +    }
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>> +    else
>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>> +
>> +    if (unlikely(ret == -EDEADLK)) {
>> +        drm_gem_object_get(obj);
>> +        exec->contended = obj;
>> +        return -EDEADLK;
>> +    }
>> +
>> +    if (unlikely(ret == -EALREADY &&
>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>> +        goto reserve_fences;
>> +
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (ret)
>> +        goto error_unlock;
>> +
>> +reserve_fences:
>> +    /* Keep locked when reserving fences fails */
>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>
> Ugh, what is the use-case for keeping things locked here? How would a 
> caller tell the difference between an error where everything is locked 
> and nothing is locked? IMO, we should unlock on error here. If there 
> indeed is a use-case we should add a separate function for reserving 
> fences for all locked objects, rather than returning sometimes locked 
> on error sometime not.

We return the object locked here because it was to much churn to remove 
it again from the array and we are getting fully cleaned up at the end 
anyway.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>
>> +
>> +error_unlock:
>> +    dma_resv_unlock(obj->resv);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>> +
>> +/**
>> + * drm_exec_prepare_array - helper to prepare an array of objects
>> + * @exec: the drm_exec object with the state
>> + * @objects: array of GEM object to prepare
>> + * @num_objects: number of GEM objects in the array
>> + * @num_fences: number of fences to reserve on each GEM object
>> + *
>> + * Prepares all GEM objects in an array, handles contention but 
>> aports on first
>> + * error otherwise. Reserves @num_fences on each GEM object after 
>> locking it.
>> + *
>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>> memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>> +
>> +MODULE_DESCRIPTION("DRM execution context");
>> +MODULE_LICENSE("Dual MIT/GPL");
>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>> new file mode 100644
>> index 000000000000..b1a5da0509c1
>> --- /dev/null
>> +++ b/include/drm/drm_exec.h
>> @@ -0,0 +1,130 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#ifndef __DRM_EXEC_H__
>> +#define __DRM_EXEC_H__
>> +
>> +#include <linux/ww_mutex.h>
>> +
>> +struct drm_gem_object;
>> +
>> +/**
>> + * enum drm_exec_flags - Execution context flags
>> + */
>> +enum drm_exec_flags {
>> +    /**
>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible 
>> locking
>> +     * functions.
>> +     */
>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>> +
>> +    /**
>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY 
>> errors.
>> +     */
>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>> +};
>> +
>> +/**
>> + * struct drm_exec - Execution context
>> + */
>> +struct drm_exec {
>> +    /**
>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>> +     */
>> +    u32 flags;
>> +
>> +    /**
>> +     * @ticket: WW ticket used for acquiring locks
>> +     */
>> +    struct ww_acquire_ctx    ticket;
>> +
>> +    /**
>> +     * @num_objects: number of objects locked
>> +     */
>> +    unsigned int        num_objects;
>> +
>> +    /**
>> +     * @max_objects: maximum objects in array
>> +     */
>> +    unsigned int        max_objects;
>> +
>> +    /**
>> +     * @objects: array of the locked objects
>> +     */
>> +    struct drm_gem_object    **objects;
>> +
>> +    /**
>> +     * @contended: contended GEM object we backed off for
>> +     */
>> +    struct drm_gem_object    *contended;
>> +
>> +    /**
>> +     * @prelocked: already locked GEM object due to contention
>> +     */
>> +    struct drm_gem_object *prelocked;
>> +};
>> +
>> +/**
>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>> objects
>> + * @exec: drm_exec object
>> + * @index: unsigned long index for the iteration
>> + * @obj: the current GEM object
>> + *
>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>> + */
>> +#define drm_exec_for_each_locked_object(exec, index, obj)    \
>> +    for (index = 0, obj = (exec)->objects[0];        \
>> +         index < (exec)->num_objects;            \
>> +         ++index, obj = (exec)->objects[index])
>> +
>> +/**
>> + * drm_exec_until_all_locked - retry objects preparation until all 
>> objects
>> + * are locked
>> + * @exec: drm_exec object
>> + * @expr: expression to be evaluated on each attempt
>> + *
>> + * This helper tries to prepare objects and if a deadlock is detected,
>> + * rollbacks and retries.
>> + *
>> + * @expr is typically a function that tries to prepare objects using
>> + * drm_exec_prepare_obj().
>> + *
>> + * If we take drm_exec_prepare_array() as an example, you should do:
>> + *
>> + *    ret = drm_exec_until_all_locked(exec,
>> + *                    drm_exec_prepare_array(exec,
>> + *                                   objs,
>> + *                                   num_objs,
>> + *                                   num_fences));
>> + *    if (ret)
>> + *        goto error_path;
>> + *
>> + *    ...
>> + *
>> + * Returns: 0 on success, a negative error code on failure.
>> + */
>> +#define drm_exec_until_all_locked(exec, expr)        \
>> +    ({                        \
>> +        __label__ retry;            \
>> +        int __ret;                \
>> +retry:                            \
>> +        __ret = expr;                \
>> +        if ((exec)->contended) {        \
>> +            WARN_ON(__ret != -EDEADLK);    \
>> +            drm_exec_reset(exec);        \
>> +            goto retry;            \
>> +        }                    \
>> +        ww_acquire_done(&(exec)->ticket);    \
>> +        __ret;                    \
>> +    })
>> +
>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>> +void drm_exec_fini(struct drm_exec *exec);
>> +void drm_exec_reset(struct drm_exec *exec);
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences);
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences);
>> +
>> +#endif



More information about the amd-gfx mailing list