[Intel-gfx] [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu May 24 10:43:27 UTC 2018


On 23/05/2018 18:12, Lionel Landwerlin wrote:
> On 23/05/18 16:13, Tvrtko Ursulin wrote:
>>
>> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>     struct drm_i915_gem_context_param arg;
>>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>                                                     .instance = 0, };
>>>
>>>     memset(&arg, 0, sizeof(arg));
>>>     arg.ctx_id = ctx;
>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>     arg.value = (uintptr_t) &sseu;
>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>         sseu.packed.subslice_mask = 0;
>>>
>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>     }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>>> (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> v5: Validate sseu configuration against the device's capabilities 
>>> (Lionel)
>>>
>>> v6: Change context powergating settings through MI_SDM on kernel 
>>> context (Chris)
>>>
>>> v7: Synchronize the requests following a powergating setting change 
>>> using a global
>>>      dependency (Chris)
>>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>>      Disable RPCS configuration setting for non capable users 
>>> (Lionel/Tvrtko)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin at intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong at intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  13 ++
>>>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_request.c     |  20 +++
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>>   8 files changed, 314 insertions(+), 35 deletions(-)

[snip]

>>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void 
>>> *data,
>>>                       struct drm_file *file)
>>>   {
>>> @@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>       case I915_CONTEXT_PARAM_PRIORITY:
>>>           args->value = ctx->sched.priority;
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU: {
>>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>>> +        struct intel_engine_cs *engine;
>>> +        struct intel_context *ce;
>>> +
>>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                   sizeof(param_sseu))) {
>>> +            ret = -EFAULT;
>>> +            break;
>>> +        }
>>> +
>>> +        engine = intel_engine_lookup_user(to_i915(dev),
>>> +                          param_sseu.class,
>>> +                          param_sseu.instance);
>>> +        if (!engine) {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        ce = to_intel_context(ctx, engine);
>>> +
>>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>>> +        param_sseu.min_eus_per_subslice = 
>>> ce->sseu.min_eus_per_subslice;
>>> +        param_sseu.max_eus_per_subslice = 
>>> ce->sseu.max_eus_per_subslice;
>>> +
>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                 sizeof(param_sseu)))
>>> +            ret = -EFAULT;
>>> +        break;
>>
>> Should we think about maybe not implementing the getter? I mean, is it 
>> useful or just code for the driver which could be dropped?
> 
> Well, render fd can be transfer between processes, so I think it makes 
> sense to have a way to tell what is the current configuration.

I was thinking that userspace can already get the default configuration 
via existing get params / topology query.

But if you change the ctx sseu config and pass that fd out its a bit 
evil. :)

Anyway, no strong feelings to keep it. Was just thinking whether we can 
save ourselves adding some code.

[snip]

>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 7f5634ce8e88..24b90836ce1d 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>>> +    /*
>>> +     * When using the following param, value should be a pointer to
>>> +     * drm_i915_gem_context_param_sseu.
>>> +     */
>>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>>       __u64 value;
>>>   };
>>>   +struct drm_i915_gem_context_param_sseu {
>>> +    /*
>>> +     * Engine class & instance to be configured or queried.
>>> +     */
>>> +    __u32 class;
>>> +    __u32 instance;
>>
>> Chris and I were talking about whether u32 is overkill and we should 
>> settle for u16:u16 for class:instance. I think 16-bit should be 
>> enough. But it can also be u32, I don't think there are any real 
>> downsides here unless we want to be consistent in all places.
> 
> Let me know what you think is best.

I'd say u16:u16, Chris?

> 
>>
>>> +
>>> +    /*
>>> +     * Mask of slices to enable for the context. Valid values are a 
>>> subset
>>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>>> +     */
>>> +    __u8 slice_mask;
>>> +
>>> +    /*
>>> +     * Mask of subslices to enable for the context. Valid values are a
>>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>>> +     */
>>> +    __u8 subslice_mask;
>>
>> Is this future proof enough, say for Gen11?
> 
> As far as I can see, this fits.
> No objection to bump it to 16/32bits if you'd like.

Feel like I've asked you this before, sorry - nothing in the future will 
need per slice subslice mask?

Regards,

Tvrtko


More information about the Intel-gfx mailing list