[Intel-gfx] [PATCH 37/40] drm/i915: Allow a context to define its set of engines

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Oct 1 08:30:46 UTC 2018


On 28/09/2018 21:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-09-27 12:28:47)
>>
>> On 19/09/2018 20:55, Chris Wilson wrote:
>>> Over the last few years, we have debated how to extend the user API to
>>> support an increase in the number of engines, that may be sparse and
>>> even be heterogeneous within a class (not all video decoders created
>>> equal). We settled on using (class, instance) tuples to identify a
>>> specific engine, with an API for the user to construct a map of engines
>>> to capabilities. Into this picture, we then add a challenge of virtual
>>> engines; one user engine that maps behind the scenes to any number of
>>> physical engines. To keep it general, we want the user to have full
>>> control over that mapping. To that end, we allow the user to constrain a
>>> context to define the set of engines that it can access, order fully
>>> controlled by the user via (class, instance). With such precise control
>>> in context setup, we can continue to use the existing execbuf uABI of
>>> specifying a single index; only now it doesn't automagically map onto
>>> the engines, it uses the user defined engine map from the context.
>>>
>>> The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
>>> execbuf. It's use will be revealed in the next patch.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c    | 88 ++++++++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_gem_context.h    |  4 +
>>>    drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 ++++--
>>>    include/uapi/drm/i915_drm.h                | 27 +++++++
>>>    4 files changed, 135 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index a8570a07b3b7..313471253f51 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -90,6 +90,7 @@
>>>    #include <drm/i915_drm.h>
>>>    #include "i915_drv.h"
>>>    #include "i915_trace.h"
>>> +#include "i915_user_extensions.h"
>>>    #include "intel_workarounds.h"
>>>    
>>>    #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
>>> @@ -223,6 +224,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>>>                        ce->ops->destroy(ce);
>>>        }
>>>    
>>> +     kfree(ctx->engines);
>>> +
>>>        if (ctx->timeline)
>>>                i915_timeline_put(ctx->timeline);
>>>    
>>> @@ -948,6 +951,87 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>        return ret;
>>>    }
>>>    
>>> +struct set_engines {
>>> +     struct i915_gem_context *ctx;
>>> +     struct intel_engine_cs **engines;
>>> +     unsigned int nengine;
>>> +};
>>> +
>>> +static const i915_user_extension_fn set_engines__extensions[] = {
>>> +};
>>
>> This is OK unless someone one day gets the desire to make the extension
>> namespace sparse. I was thinking on how to put some warnings in the code
>> to warn about it, but I think that's for later. Namespace is also per
>> parent ioctl, another thing which would perhaps need enforcing.
> 
> Unless you intend to go extremely sparse, I'd just leave it with the
> usual [NAME1] = func1, and skipping over NULLs. We can always add
> alternatives if need be (I'd actually been meaning to convert execbuf3
> over to this scheme to flesh it out a bit more).

It is fine for now, I was just thinking out loud how to protect us 
against the table accidentally growing huge one day.

>>> +static int set_engines(struct i915_gem_context *ctx,
>>> +                    struct drm_i915_gem_context_param *args)
>>> +{
>>> +     struct i915_context_param_engines __user *user;
>>> +     struct set_engines set = {
>>> +             .ctx = ctx,
>>> +             .engines = NULL,
>>> +             .nengine = -1,
>>> +     };
>>> +     unsigned int n;
>>> +     u64 size, extensions;
>>
>> Size is u32 in the uAPI, so either that or unsigned int would do here.
>>
>>> +     int err;
>>> +
>>> +     user = u64_to_user_ptr(args->value);
>>> +     size = args->size;
>>> +     if (!size)
>>
>> args->sizes = sizeof(*user);
>>
>> ... if you want to allow size probing via set param, or we add a
>> get_param (too much work for nothing?) and only allow probing from there?
> 
> It's a variable array, so a little trickier.

Indeed!

> 
>>> +             goto out;
>>> +
>>> +     if (size < sizeof(struct i915_context_param_engines))
>>> +             return -EINVAL;
>>> +
>>> +     size -= sizeof(struct i915_context_param_engines);
>>> +     if (size % sizeof(*user->class_instance))
>>> +             return -EINVAL;
>>> +
>>> +     set.nengine = size / sizeof(*user->class_instance);
>>> +     if (set.nengine == 0 || set.nengine >= I915_EXEC_RING_MASK)
>>> +             return -EINVAL;
>>> +
>>> +     set.engines = kmalloc_array(set.nengine + 1,
>>> +                                 sizeof(*set.engines),
>>> +                                 GFP_KERNEL);
>>> +     if (!set.engines)
>>> +             return -ENOMEM;
>>> +
>>> +     set.engines[0] = NULL;
>>
>> /* Reserve the I915_EXEC_DEFAULT slot. */
>>
>>> +     for (n = 0; n < set.nengine; n++) {
>>> +             u32 class, instance;
>>
>> I will later recommend we use u16 for class/instance. I think we settled
>> for that in the meantime.
>>
>>> +
>>> +             if (get_user(class, &user->class_instance[n].class) ||
>>> +                 get_user(instance, &user->class_instance[n].instance)) {
>>> +                     kfree(set.engines);
>>> +                     return -EFAULT;
>>> +             }
>>> +
>>> +             set.engines[n + 1] =
>>> +                     intel_engine_lookup_user(ctx->i915, class, instance);
>>> +             if (!set.engines[n + 1]) {
>>> +                     kfree(set.engines);
>>> +                     return -ENOENT;
>>> +             }
>>> +     }
>>> +
>>> +     err = -EFAULT;
>>> +     if (!get_user(extensions, &user->extensions))
>>> +             err = i915_user_extensions(u64_to_user_ptr(extensions),
>>> +                                        set_engines__extensions,
>>> +                                        ARRAY_SIZE(set_engines__extensions),
>>> +                                        &set);
>>> +     if (err) {
>>
>> Do we need an user extensions destructor table for a more
>> compartmentalized design? This actually applies to the
>> i915_user_extensions() implementation.. it would need to take two
>> function tables I think and unwind until the one that failed.
> 
> Yeah, dtor for unwind is intriguing.
> 
>>> +             kfree(set.engines);
>>> +             return err;
>>> +     }
>>> +
>>> +out:
>>> +     kfree(ctx->engines);
>>> +     ctx->engines = set.engines;
>>> +     ctx->nengine = set.nengine + 1;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>    int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                                    struct drm_file *file)
>>>    {
>>> @@ -1011,6 +1095,10 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                }
>>>                break;
>>>    
>>> +     case I915_CONTEXT_PARAM_ENGINES:
>>> +             ret = set_engines(ctx, args);
>>> +             break;
>>> +
>>>        default:
>>>                ret = -EINVAL;
>>>                break;
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
>>> index 770043449ba6..9f89119a6566 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
>>> @@ -67,6 +67,8 @@ struct i915_gem_context {
>>>        /** file_priv: owning file descriptor */
>>>        struct drm_i915_file_private *file_priv;
>>>    
>>> +     struct intel_engine_cs **engines;
>>> +
>>>        struct i915_timeline *timeline;
>>>    
>>>        /**
>>> @@ -135,6 +137,8 @@ struct i915_gem_context {
>>>    #define CONTEXT_CLOSED                      1
>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION     2
>>>    
>>> +     unsigned int nengine;
>>
>> Why put related fields apart?
> 
> Packing.

Hm it feels to far. It's one per execbuf lookup so not that hot. Move 
the engines and nengines down to after struct head_rcu? And add comments 
against the members of course! :)

> 
>>
>>> +
>>>        /**
>>>         * @hw_id: - unique identifier for the context
>>>         *
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 109486a6ef07..faedfdca875d 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -2014,13 +2014,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
>>>    };
>>>    
>>>    static struct intel_engine_cs *
>>> -eb_select_engine(struct drm_i915_private *dev_priv,
>>> +eb_select_engine(struct i915_execbuffer *eb,
>>>                 struct drm_file *file,
>>>                 struct drm_i915_gem_execbuffer2 *args)
>>>    {
>>>        unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
>>>        struct intel_engine_cs *engine;
>>>    
>>> +     if (eb->ctx->engines) {
>>> +             if (user_ring_id >= eb->ctx->nengine) {
>>> +                     DRM_DEBUG("execbuf with unknown ring: %u\n",
>>> +                               user_ring_id);
>>> +                     return NULL;
>>> +             }
>>> +
>>> +             return eb->ctx->engines[user_ring_id];
>>> +     }
>>> +
>>>        if (user_ring_id > I915_USER_RINGS) {
>>>                DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
>>>                return NULL;
>>> @@ -2033,11 +2043,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>>>                return NULL;
>>>        }
>>>    
>>> -     if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
>>> +     if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(eb->i915)) {
>>>                unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
>>>    
>>>                if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
>>> -                     bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
>>> +                     bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
>>>                } else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
>>>                           bsd_idx <= I915_EXEC_BSD_RING2) {
>>>                        bsd_idx >>= I915_EXEC_BSD_SHIFT;
>>> @@ -2048,9 +2058,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>>>                        return NULL;
>>>                }
>>>    
>>> -             engine = dev_priv->engine[_VCS(bsd_idx)];
>>> +             engine = eb->i915->engine[_VCS(bsd_idx)];
>>>        } else {
>>> -             engine = dev_priv->engine[user_ring_map[user_ring_id]];
>>> +             engine = eb->i915->engine[user_ring_map[user_ring_id]];
>>>        }
>>>    
>>>        if (!engine) {
>>> @@ -2260,7 +2270,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>>>        if (unlikely(err))
>>>                goto err_destroy;
>>>    
>>> -     eb.engine = eb_select_engine(eb.i915, file, args);
>>> +     eb.engine = eb_select_engine(&eb, file, args);
>>>        if (!eb.engine) {
>>>                err = -EINVAL;
>>>                goto err_engine;
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 65b0b84419f3..d41b4c673af4 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -1508,9 +1508,36 @@ struct drm_i915_gem_context_param {
>>>    #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>>    #define   I915_CONTEXT_DEFAULT_PRIORITY             0
>>>    #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>>> +
>>> +/*
>>> + * I915_CONTEXT_PARAM_ENGINES:
>>> + *
>>> + * Bind this context to operate on this subset of available engines. Henceforth,
>>> + * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
>>> + * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
>>> + * and upwards. The array created is offset by 1, such that by default
>>> + * I915_EXEC_DEFAULT is left empty, to be filled in as directed. Slots 1...N
>>> + * are then filled in using the specified (class, instance).
>>
>> Should we allow specifying the default index? I am wondering if that
>> would make life easier to any userspace.
> 
> Could do, certainly gets rid of the clumsy nengines+1, but the skip for
> unknown class may make up for the clumsyness :)

Yeah I don't know. Userspace has to create a context at which point 
there isn't much to gain with the default index capability. So it's 
probably pointless.

I guess what I am thinking of is how we'll make IGT use new engines 
without too much churn, since we dropped the plain class/instance 
execbuf. But the default index idea doesn't help that anyway.

Regards,

Tvrtko


More information about the Intel-gfx mailing list