[Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

Tue Aug 1 20:58:14 UTC 2017

On 01/08/17 19:05, sourab gupta wrote:
>
>
> On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A 
> <sagar.a.kamble at intel.com <mailto:sagar.a.kamble at intel.com>> wrote:
>
>
>
>     -----Original Message-----
>     From: Landwerlin, Lionel G
>     Sent: Monday, July 31, 2017 9:16 PM
>     To: Kamble, Sagar A <sagar.a.kamble at intel.com
>     <mailto:sagar.a.kamble at intel.com>>;
>     intel-gfx at lists.freedesktop.org
>     <mailto:intel-gfx at lists.freedesktop.org>
>     Cc: Sourab Gupta <sourab.gupta at intel.com
>     <mailto:sourab.gupta at intel.com>>
>     Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for
>     capturing command stream based OA reports and ctx id info.
>
>     On 31/07/17 08:59, Sagar Arun Kamble wrote:
>     > From: Sourab Gupta <sourab.gupta at intel.com
>     <mailto:sourab.gupta at intel.com>>
>     >
>     > This patch introduces a framework to capture OA counter reports
>     associated
>     > with Render command stream. We can then associate the reports
>     captured
>     > through this mechanism with their corresponding context id's.
>     This can be
>     > further extended to associate any other metadata information
>     with the
>     > corresponding samples (since the association with Render command
>     stream
>     > gives us the ability to capture these information while
>     inserting the
>     > corresponding capture commands into the command stream).
>     >
>     > The OA reports generated in this way are associated with a
>     corresponding
>     > workload, and thus can be used the delimit the workload (i.e.
>     sample the
>     > counters at the workload boundaries), within an ongoing stream
>     of periodic
>     > counter snapshots.
>     >
>     > There may be usecases wherein we need more than periodic OA
>     capture mode
>     > which is supported currently. This mode is primarily used for
>     two usecases:
>     >      - Ability to capture system wide metrics, alongwith the
>     ability to map
>     >        the reports back to individual contexts (particularly for
>     HSW).
>     >      - Ability to inject tags for work, into the reports. This
>     provides
>     >        visibility into the multiple stages of work within single
>     context.
>     >
>     > The userspace will be able to distinguish between the periodic
>     and CS based
>     > OA reports by the virtue of source_info sample field.
>     >
>     > The command MI_REPORT_PERF_COUNT can be used to capture
>     snapshots of OA
>     > counters, and is inserted at BB boundaries.
>     > The data thus captured will be stored in a separate buffer,
>     which will
>     > be different from the buffer used otherwise for periodic OA
>     capture mode.
>     > The metadata information pertaining to snapshot is maintained in
>     a list,
>     > which also has offsets into the gem buffer object per captured
>     snapshot.
>     > In order to track whether the gpu has completed processing the node,
>     > a field pertaining to corresponding gem request is added, which
>     is tracked
>     > for completion of the command.
>     >
>     > Both periodic and CS based reports are associated with a single
>     stream
>     > (corresponding to render engine), and it is expected to have the
>     samples
>     > in the sequential order according to their timestamps. Now,
>     since these
>     > reports are collected in separate buffers, these are merge
>     sorted at the
>     > time of forwarding to userspace during the read call.
>     >
>     > v2: Aligning with the non-perf interface (custom drm ioctl
>     based). Also,
>     > few related patches are squashed together for better readability
>     >
>     > v3: Updated perf sample capture emit hook name. Reserving space
>     upfront
>     > in the ring for emitting sample capture commands and using
>     > req->fence.seqno for tracking samples. Added SRCU protection for
>     streams.
>     > Changed the stream last_request tracking to resv object. (Chris)
>     > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
>     > stream to global per-engine structure. (Sagar)
>     > Update unpin and put in the free routines to
>     i915_vma_unpin_and_release.
>     > Making use of perf stream cs_buffer vma resv instead of separate
>     resv obj.
>     > Pruned perf stream vma resv during gem_idle. (Chris)
>     > Changed payload field ctx_id to u64 to keep all sample data
>     aligned at 8
>     > bytes. (Lionel)
>     > stall/flush prior to sample capture is not added. Do we need to
>     give this
>     > control to user to select whether to stall/flush at each sample?
>     >
>     > Signed-off-by: Sourab Gupta <sourab.gupta at intel.com
>     <mailto:sourab.gupta at intel.com>>
>     > Signed-off-by: Robert Bragg <robert at sixbynine.org
>     <mailto:robert at sixbynine.org>>
>     > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com
>     <mailto:sagar.a.kamble at intel.com>>
>     > ---
>     >   drivers/gpu/drm/i915/i915_drv.h |  101 ++-
>     >   drivers/gpu/drm/i915/i915_gem.c |    1 +
>     >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>     >   drivers/gpu/drm/i915/i915_perf.c  | 1185
>     ++++++++++++++++++++++------
>     >   drivers/gpu/drm/i915/intel_engine_cs.c  |    4 +
>     >   drivers/gpu/drm/i915/intel_ringbuffer.c |    2 +
>     >   drivers/gpu/drm/i915/intel_ringbuffer.h |    5 +
>     >   include/uapi/drm/i915_drm.h                |  15 +
>     >   8 files changed, 1073 insertions(+), 248 deletions(-)
>     >
>     > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>     b/drivers/gpu/drm/i915/i915_drv.h
>     > index 2c7456f..8b1cecf 100644
>     > --- a/drivers/gpu/drm/i915/i915_drv.h
>     > +++ b/drivers/gpu/drm/i915/i915_drv.h
>     > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>     >        * The stream will always be disabled before this is called.
>     >        */
>     >       void (*destroy)(struct i915_perf_stream *stream);
>     > +
>     > +     /*
>     > +      * @emit_sample_capture: Emit the commands in the command
>     streamer
>     > +      * for a particular gpu engine.
>     > +      *
>     > +      * The commands are inserted to capture the perf sample
>     data at
>     > +      * specific points during workload execution, such as
>     before and after
>     > +      * the batch buffer.
>     > +      */
>     > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
>     > +                                 struct drm_i915_gem_request
>     *request,
>     > +                                 bool preallocate);
>     > +};
>     > +
>
>     It seems the motivation for this following enum is mostly to deal with
>     the fact that engine->perf_srcu is set before the OA unit is
>     configured.
>     Would it possible to set it later so that we get rid of the enum?
>
>     <Sagar> I will try to make this as just binary state. This enum is
>     defining the state of the stream. I too got confused with purpose
>     of IN_PROGRESS.
>     SRCU is used for synchronizing stream state check.
>     IN_PROGRESS will enable us to not advertently try to access the
>     stream vma for inserting the samples, but I guess depending on
>     disabled/enabled should
>     suffice.
>
>
> Hi Sagar/Lionel,

Hi Sourab,

Thanks again for your input on this.

>
> The purpose of the tristate was to workaround a particular kludge of
> working with just enabled/disabled boolean state. I'll explain below.
>
> Let's say we have only boolean state.
> i915_perf_emit_sample_capture() function would depend on
> stream->enabled in order to insert the MI_RPC command in RCS.
> If you see i915_perf_enable_locked(), stream->enabled is set before
> stream->ops->enable(). The stream->ops->enable() function actually
> enables the OA hardware to capture reports, and if MI_RPC commands
> are submitted before OA hw is enabled, it may hang the gpu.

Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.

>
> Also, we can't change the order of calling these operations inside
> i915_perf_enable_locked() since gen7_update_oacontrol_locked()
> function depends on stream->enabled flag to enable the OA
> hw unit (i.e. it needs the flag to be true).

We can probably work around that by passing some arguments.

> To workaround this problem, I introduced a tristate here.
> If you can suggest some alternate solution to this problem,
> we can remove this tristate kludge here.
> Regards,
> Sourab
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170801/f24178eb/attachment-0001.html>