[Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

Chris Wilson chris at chris-wilson.co.uk
Tue Aug 14 14:59:53 UTC 2018


Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> From: Chris Wilson <chris at chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>                                                         .instance = 0, };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a global
>     dependency (Chris)
>     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>     s/dev_priv/i915/ (Tvrtko)
>     Change uapi class/instance fields to u16 (Tvrtko)
>     Bump mask fields to 64bits (Lionel)
>     Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> 
> v9: Import context image into kernel context's ppgtt only when
>     reconfiguring powergated slice/subslices (Chris)
>     Use aliasing ppgtt when needed (Michel)
> 
> Tvrtko Ursulin:
> 
> v10:
>  * Update for upstream changes.
>  * Request submit needs a RPM reference.
>  * Reject on !FULL_PPGTT for simplicity.
>  * Pull out get/set param to helpers for readability and less indent.
>  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>    on the same timeline and avoid GEM_BUG_ON.
>  * No need to explicitly assign a NULL pointer to engine in legacy mode.
>  * No need to move gen8_make_rpcs up.
>  * Factored out global barrier as prep patch.
>  * Allow to only CAP_SYS_ADMIN if !Gen11.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Issue: https://github.com/intel/media-driver/issues/267
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Zhipeng Gong <zhipeng.gong at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>  include/uapi/drm/i915_drm.h             |  43 ++++++
>  4 files changed, 288 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8a12984e7495..6d6220634e9e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +                                 struct intel_engine_cs *engine,
> +                                 struct intel_sseu sseu)
> +{
> +       struct drm_i915_private *i915 = ctx->i915;
> +       struct i915_request *rq;
> +       struct intel_ring *ring;
> +       int ret;
> +
> +       lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +       /* Submitting requests etc needs the hw awake. */
> +       intel_runtime_pm_get(i915);
> +
> +       i915_retire_requests(i915);

?

> +
> +       /* Now use the RCS to actually reconfigure. */
> +       engine = i915->engine[RCS];

? Modifying registers stored in another engine's context image.

> +
> +       rq = i915_request_alloc(engine, i915->kernel_context);
> +       if (IS_ERR(rq)) {
> +               ret = PTR_ERR(rq);
> +               goto out_put;
> +       }
> +
> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);

It's just an LRI, I'd rather we do it directly unless there's evidence
that there will be na explicit rpcs config instruction in future. It
just doesn't seem general enough.

> +       if (ret)
> +               goto out_add;
> +
> +       /* Queue this switch after all other activity */

Only needs to be after the target ctx.

> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> +               struct i915_request *prev;
> +
> +               prev = last_request_on_engine(ring->timeline, engine);

As constructed above you need target-engine + RCS.

> +               if (prev)
> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +                                                        &prev->submit,
> +                                                        I915_FENCE_GFP);
> +       }
> +
> +       i915_gem_set_global_barrier(i915, rq);

This is just for a link from ctx-engine to this rq. Overkill much?
Presumably this stems from using the wrong engine.

> +
> +out_add:
> +       i915_request_add(rq);

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris


More information about the Intel-gfx mailing list