[Intel-gfx] [PATCH v16 08/17] drm/i915/perf: Add OA unit support for Gen 8+

Mon Jun 12 17:23:53 UTC 2017

On 5 June 2017 at 15:48, Lionel Landwerlin
<lionel.g.landwerlin at intel.com> wrote:
> From: Robert Bragg <robert at sixbynine.org>
>
> Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
> share (more-or-less) the same OA unit design.
>
> Of particular note in comparison to Haswell: some OA unit HW config
> state has become per-context state and as a consequence it is somewhat
> more complicated to manage synchronous state changes from the cpu while
> there's no guarantee of what context (if any) is currently actively
> running on the gpu.
>
> The periodic sampling frequency which can be particularly useful for
> system-wide analysis (as opposed to command stream synchronised
> MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
> have become per-context save and restored (while the OABUFFER
> destination is still a shared, system-wide resource).
>
> This support for gen8+ takes care to consider a number of timing
> challenges involved in synchronously updating per-context state
> primarily by programming all config state from the cpu and updating all
> current and saved contexts synchronously while the OA unit is still
> disabled.
>
> The driver intentionally avoids depending on command streamer
> programming to update OA state considering the lack of synchronization
> between the automatic loading of OACTXCONTROL state (that includes the
> periodic sampling state and enable state) on context restore and the
> parsing of any general purpose BB the driver can control. I.e. this
> implementation is careful to avoid the possibility of a context restore
> temporarily enabling any out-of-date periodic sampling state. In
> addition to the risk of transiently-out-of-date state being loaded
> automatically; there are also internal HW latencies involved in the
> loading of MUX configurations which would be difficult to account for
> from the command streamer (and we only want to enable the unit when once
> the MUX configuration is complete).
>
> Since the Gen8+ OA unit design no longer supports clock gating the unit
> off for a single given context (which effectively stopped any progress
> of counters while any other context was running) and instead supports
> tagging OA reports with a context ID for filtering on the CPU, it means
> we can no longer hide the system-wide progress of counters from a
> non-privileged application only interested in metrics for its own
> context. Although we could theoretically try and subtract the progress
> of other contexts before forwarding reports via read() we aren't in a
> position to filter reports captured via MI_REPORT_PERF_COUNT commands.
> As a result, for Gen8+, we always require the
> dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
> if not root.
>
> v5: Drain submitted requests when enabling metric set to ensure no
>     lite-restore erases the context image we just updated (Lionel)
>
> v6: In addition to drain, switch to kernel context & update all
>     context in place (Chris)
>
> v7: Add missing mutex_unlock() if switching to kernel context fails
>     (Matthew)
>
> v8: Simplify OA period/flex-eu-counters programming by using the
>     batchbuffer instead of modifying ctx-image (Lionel)
>
> v9: Back to updating the context image (due to erroneous testing,
>     batchbuffer programming the OA unit doesn't actually work)
>     (Lionel)
>     Pin context before updating context image (Chris)
>     Drop MMIO programming now that we switch to a kernel context with
>     right values in initial context image (Chris)
>
> v10: Just pin_map the contexts we want to modify or let the
>      configuration happen on first use (Chris)
>
> v11: Update kernel context OA config through the batchbuffer rather
>      than on the fly ctx-image update (Lionel)
>
> v12: Rework OA context registers update again by swithing away from
>      user contexts and reconfiguring the kernel context through the
>      batchbuffer and updating all the other contexts' context image.
>      Also take care to lock slice/subslice configuration when OA is
>      on. (Lionel)
>
> v13: Request rpcs updates on all engine when updating the OA config
>      (Lionel)
>
> v14: Drop any kind of rpcs management now that we monitor sseu
>      configuration changes in a later patch (Lionel)
>      Remove usleep after programming the NOA configs on Gen8+, this
>      doesn't seem to be needed (Lionel)
>
> v15: Respect coding style for block comments (Chris)
>
> Signed-off-by: Robert Bragg <robert at sixbynine.org>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Reviewed-by: Matthew Auld <matthew.auld at intel.com> \o/
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |   46 +-
>  drivers/gpu/drm/i915/i915_perf.c | 1032 +++++++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_reg.h  |   22 +
>  drivers/gpu/drm/i915/intel_lrc.c |    2 +
>  drivers/gpu/drm/i915/intel_lrc.h |    2 +
>  include/uapi/drm/i915_drm.h      |   19 +-
>  6 files changed, 1028 insertions(+), 95 deletions(-)

<SNIP>

> +
> +static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv)
> +{
> +       struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +       struct i915_gem_timeline *timeline;
> +       struct drm_i915_gem_request *req;
> +       int ret;
> +
> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +       i915_gem_retire_requests(dev_priv);
> +
> +       req = i915_gem_request_alloc(engine, dev_priv->kernel_context);
> +       if (IS_ERR(req))
> +               return PTR_ERR(req);
> +
> +       ret = gen8_emit_oa_config(req);
> +       if (ret)
> +               return ret;
If we do request_alloc, aren't we always supposed to follow up with an
add_request, otherwise the kernel isn't tracking it?