[Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Tue Aug 1 20:58:14 UTC 2017
On 01/08/17 19:05, sourab gupta wrote:
>
>
> On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A
> <sagar.a.kamble at intel.com <mailto:sagar.a.kamble at intel.com>> wrote:
>
>
>
> -----Original Message-----
> From: Landwerlin, Lionel G
> Sent: Monday, July 31, 2017 9:16 PM
> To: Kamble, Sagar A <sagar.a.kamble at intel.com
> <mailto:sagar.a.kamble at intel.com>>;
> intel-gfx at lists.freedesktop.org
> <mailto:intel-gfx at lists.freedesktop.org>
> Cc: Sourab Gupta <sourab.gupta at intel.com
> <mailto:sourab.gupta at intel.com>>
> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for
> capturing command stream based OA reports and ctx id info.
>
> On 31/07/17 08:59, Sagar Arun Kamble wrote:
> > From: Sourab Gupta <sourab.gupta at intel.com
> <mailto:sourab.gupta at intel.com>>
> >
> > This patch introduces a framework to capture OA counter reports
> associated
> > with Render command stream. We can then associate the reports
> captured
> > through this mechanism with their corresponding context id's.
> This can be
> > further extended to associate any other metadata information
> with the
> > corresponding samples (since the association with Render command
> stream
> > gives us the ability to capture these information while
> inserting the
> > corresponding capture commands into the command stream).
> >
> > The OA reports generated in this way are associated with a
> corresponding
> > workload, and thus can be used the delimit the workload (i.e.
> sample the
> > counters at the workload boundaries), within an ongoing stream
> of periodic
> > counter snapshots.
> >
> > There may be usecases wherein we need more than periodic OA
> capture mode
> > which is supported currently. This mode is primarily used for
> two usecases:
> > - Ability to capture system wide metrics, alongwith the
> ability to map
> > the reports back to individual contexts (particularly for
> HSW).
> > - Ability to inject tags for work, into the reports. This
> provides
> > visibility into the multiple stages of work within single
> context.
> >
> > The userspace will be able to distinguish between the periodic
> and CS based
> > OA reports by the virtue of source_info sample field.
> >
> > The command MI_REPORT_PERF_COUNT can be used to capture
> snapshots of OA
> > counters, and is inserted at BB boundaries.
> > The data thus captured will be stored in a separate buffer,
> which will
> > be different from the buffer used otherwise for periodic OA
> capture mode.
> > The metadata information pertaining to snapshot is maintained in
> a list,
> > which also has offsets into the gem buffer object per captured
> snapshot.
> > In order to track whether the gpu has completed processing the node,
> > a field pertaining to corresponding gem request is added, which
> is tracked
> > for completion of the command.
> >
> > Both periodic and CS based reports are associated with a single
> stream
> > (corresponding to render engine), and it is expected to have the
> samples
> > in the sequential order according to their timestamps. Now,
> since these
> > reports are collected in separate buffers, these are merge
> sorted at the
> > time of forwarding to userspace during the read call.
> >
> > v2: Aligning with the non-perf interface (custom drm ioctl
> based). Also,
> > few related patches are squashed together for better readability
> >
> > v3: Updated perf sample capture emit hook name. Reserving space
> upfront
> > in the ring for emitting sample capture commands and using
> > req->fence.seqno for tracking samples. Added SRCU protection for
> streams.
> > Changed the stream last_request tracking to resv object. (Chris)
> > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> > stream to global per-engine structure. (Sagar)
> > Update unpin and put in the free routines to
> i915_vma_unpin_and_release.
> > Making use of perf stream cs_buffer vma resv instead of separate
> resv obj.
> > Pruned perf stream vma resv during gem_idle. (Chris)
> > Changed payload field ctx_id to u64 to keep all sample data
> aligned at 8
> > bytes. (Lionel)
> > stall/flush prior to sample capture is not added. Do we need to
> give this
> > control to user to select whether to stall/flush at each sample?
> >
> > Signed-off-by: Sourab Gupta <sourab.gupta at intel.com
> <mailto:sourab.gupta at intel.com>>
> > Signed-off-by: Robert Bragg <robert at sixbynine.org
> <mailto:robert at sixbynine.org>>
> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com
> <mailto:sagar.a.kamble at intel.com>>
> > ---
> > drivers/gpu/drm/i915/i915_drv.h | 101 ++-
> > drivers/gpu/drm/i915/i915_gem.c | 1 +
> > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 +
> > drivers/gpu/drm/i915/i915_perf.c | 1185
> ++++++++++++++++++++++------
> > drivers/gpu/drm/i915/intel_engine_cs.c | 4 +
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +
> > include/uapi/drm/i915_drm.h | 15 +
> > 8 files changed, 1073 insertions(+), 248 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 2c7456f..8b1cecf 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
> > * The stream will always be disabled before this is called.
> > */
> > void (*destroy)(struct i915_perf_stream *stream);
> > +
> > + /*
> > + * @emit_sample_capture: Emit the commands in the command
> streamer
> > + * for a particular gpu engine.
> > + *
> > + * The commands are inserted to capture the perf sample
> data at
> > + * specific points during workload execution, such as
> before and after
> > + * the batch buffer.
> > + */
> > + void (*emit_sample_capture)(struct i915_perf_stream *stream,
> > + struct drm_i915_gem_request
> *request,
> > + bool preallocate);
> > +};
> > +
>
> It seems the motivation for this following enum is mostly to deal with
> the fact that engine->perf_srcu is set before the OA unit is
> configured.
> Would it possible to set it later so that we get rid of the enum?
>
> <Sagar> I will try to make this as just binary state. This enum is
> defining the state of the stream. I too got confused with purpose
> of IN_PROGRESS.
> SRCU is used for synchronizing stream state check.
> IN_PROGRESS will enable us to not advertently try to access the
> stream vma for inserting the samples, but I guess depending on
> disabled/enabled should
> suffice.
>
>
> Hi Sagar/Lionel,
Hi Sourab,
Thanks again for your input on this.
>
> The purpose of the tristate was to workaround a particular kludge of
> working with just enabled/disabled boolean state. I'll explain below.
>
> Let's say we have only boolean state.
> i915_perf_emit_sample_capture() function would depend on
> stream->enabled in order to insert the MI_RPC command in RCS.
> If you see i915_perf_enable_locked(), stream->enabled is set before
> stream->ops->enable(). The stream->ops->enable() function actually
> enables the OA hardware to capture reports, and if MI_RPC commands
> are submitted before OA hw is enabled, it may hang the gpu.
Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.
>
> Also, we can't change the order of calling these operations inside
> i915_perf_enable_locked() since gen7_update_oacontrol_locked()
> function depends on stream->enabled flag to enable the OA
> hw unit (i.e. it needs the flag to be true).
We can probably work around that by passing some arguments.
> To workaround this problem, I introduced a tristate here.
> If you can suggest some alternate solution to this problem,
> we can remove this tristate kludge here.
> Regards,
> Sourab
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170801/f24178eb/attachment-0001.html>
More information about the Intel-gfx
mailing list