<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <span dir="ltr"><<a href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><div><div class="h5">
<div class="m_-4330140404967524317moz-cite-prefix">On 01/08/17 19:05, sourab gupta wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug 1, 2017 at 2:59 PM,
Kamble, Sagar A <span dir="ltr"><<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="m_-4330140404967524317gmail-HOEnZb">
<div class="m_-4330140404967524317gmail-h5"><br>
<br>
-----Original Message-----<br>
From: Landwerlin, Lionel G<br>
Sent: Monday, July 31, 2017 9:16 PM<br>
To: Kamble, Sagar A <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>>;
<a href="mailto:intel-gfx@lists.freedesktop.org" target="_blank">intel-gfx@lists.freedesktop.or<wbr>g</a><br>
Cc: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915:
Framework for capturing command stream based OA
reports and ctx id info.<br>
<br>
On 31/07/17 08:59, Sagar Arun Kamble wrote:<br>
> From: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
><br>
> This patch introduces a framework to capture OA
counter reports associated<br>
> with Render command stream. We can then associate
the reports captured<br>
> through this mechanism with their corresponding
context id's. This can be<br>
> further extended to associate any other metadata
information with the<br>
> corresponding samples (since the association with
Render command stream<br>
> gives us the ability to capture these information
while inserting the<br>
> corresponding capture commands into the command
stream).<br>
><br>
> The OA reports generated in this way are
associated with a corresponding<br>
> workload, and thus can be used the delimit the
workload (i.e. sample the<br>
> counters at the workload boundaries), within an
ongoing stream of periodic<br>
> counter snapshots.<br>
><br>
> There may be usecases wherein we need more than
periodic OA capture mode<br>
> which is supported currently. This mode is
primarily used for two usecases:<br>
> - Ability to capture system wide metrics,
alongwith the ability to map<br>
> the reports back to individual contexts
(particularly for HSW).<br>
> - Ability to inject tags for work, into the
reports. This provides<br>
> visibility into the multiple stages of
work within single context.<br>
><br>
> The userspace will be able to distinguish between
the periodic and CS based<br>
> OA reports by the virtue of source_info sample
field.<br>
><br>
> The command MI_REPORT_PERF_COUNT can be used to
capture snapshots of OA<br>
> counters, and is inserted at BB boundaries.<br>
> The data thus captured will be stored in a
separate buffer, which will<br>
> be different from the buffer used otherwise for
periodic OA capture mode.<br>
> The metadata information pertaining to snapshot
is maintained in a list,<br>
> which also has offsets into the gem buffer object
per captured snapshot.<br>
> In order to track whether the gpu has completed
processing the node,<br>
> a field pertaining to corresponding gem request
is added, which is tracked<br>
> for completion of the command.<br>
><br>
> Both periodic and CS based reports are associated
with a single stream<br>
> (corresponding to render engine), and it is
expected to have the samples<br>
> in the sequential order according to their
timestamps. Now, since these<br>
> reports are collected in separate buffers, these
are merge sorted at the<br>
> time of forwarding to userspace during the read
call.<br>
><br>
> v2: Aligning with the non-perf interface (custom
drm ioctl based). Also,<br>
> few related patches are squashed together for
better readability<br>
><br>
> v3: Updated perf sample capture emit hook name.
Reserving space upfront<br>
> in the ring for emitting sample capture commands
and using<br>
> req->fence.seqno for tracking samples. Added
SRCU protection for streams.<br>
> Changed the stream last_request tracking to resv
object. (Chris)<br>
> Updated perf.sample_lock spin_lock usage to avoid
softlockups. Moved<br>
> stream to global per-engine structure. (Sagar)<br>
> Update unpin and put in the free routines to
i915_vma_unpin_and_release.<br>
> Making use of perf stream cs_buffer vma resv
instead of separate resv obj.<br>
> Pruned perf stream vma resv during gem_idle.
(Chris)<br>
> Changed payload field ctx_id to u64 to keep all
sample data aligned at 8<br>
> bytes. (Lionel)<br>
> stall/flush prior to sample capture is not added.
Do we need to give this<br>
> control to user to select whether to stall/flush
at each sample?<br>
><br>
> Signed-off-by: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
> Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org" target="_blank">robert@sixbynine.org</a>><br>
> Signed-off-by: Sagar Arun Kamble <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>><br>
> ---<br>
> drivers/gpu/drm/i915/i915_<wbr>drv.h
| 101 ++-<br>
> drivers/gpu/drm/i915/i915_<wbr>gem.c
| 1 +<br>
> drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c
| 8 +<br>
> drivers/gpu/drm/i915/i915_per<wbr>f.c
| 1185 ++++++++++++++++++++++------<br>
> drivers/gpu/drm/i915/intel_en<wbr>gine_cs.c
| 4 +<br>
> drivers/gpu/drm/i915/intel_ri<wbr>ngbuffer.c
| 2 +<br>
> drivers/gpu/drm/i915/intel_ri<wbr>ngbuffer.h
| 5 +<br>
> include/uapi/drm/i915_drm.h |
15 +<br>
> 8 files changed, 1073 insertions(+), 248
deletions(-)<br>
><br>
> diff --git a/drivers/gpu/drm/i915/i915_dr<wbr>v.h
b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
> index 2c7456f..8b1cecf 100644<br>
> --- a/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
> +++ b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
> @@ -1985,6 +1985,24 @@ struct
i915_perf_stream_ops {<br>
> * The stream will always be disabled
before this is called.<br>
> */<br>
> void (*destroy)(struct i915_perf_stream
*stream);<br>
> +<br>
> + /*<br>
> + * @emit_sample_capture: Emit the commands
in the command streamer<br>
> + * for a particular gpu engine.<br>
> + *<br>
> + * The commands are inserted to capture the
perf sample data at<br>
> + * specific points during workload
execution, such as before and after<br>
> + * the batch buffer.<br>
> + */<br>
> + void (*emit_sample_capture)(struct
i915_perf_stream *stream,<br>
> + struct
drm_i915_gem_request *request,<br>
> + bool
preallocate);<br>
> +};<br>
> +<br>
<br>
It seems the motivation for this following enum is
mostly to deal with<br>
the fact that engine->perf_srcu is set before the
OA unit is configured.<br>
Would it possible to set it later so that we get rid
of the enum?<br>
<br>
</div>
</div>
<Sagar> I will try to make this as just binary
state. This enum is defining the state of the stream. I
too got confused with purpose of IN_PROGRESS.<br>
SRCU is used for synchronizing stream state check.<br>
IN_PROGRESS will enable us to not advertently try to
access the stream vma for inserting the samples, but I
guess depending on disabled/enabled should<br>
suffice.<br>
</blockquote>
<div><br>
</div>
<div>Hi Sagar/Lionel,</div>
</div>
</div>
</div>
</blockquote>
<br></div></div>
Hi Sourab,<br>
<br>
Thanks again for your input on this.<span class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>The purpose of the tristate was to workaround a
particular kludge of</div>
<div>working with just enabled/disabled boolean state. I'll
explain below.</div>
<div><br>
</div>
<div>Let's say we have only boolean state.</div>
<div><span style="color:rgb(80,0,80)">i915_perf_emit_sample_capture(<wbr>)
function would depend on</span></div>
<div><span style="color:rgb(80,0,80)">stream->enabled in
order to insert the MI_RPC command in RCS.</span></div>
<div><span style="color:rgb(80,0,80)">If you see </span>i915_perf_enable_locked(),
stream->enabled is set before</div>
<div>stream->ops->enable(). The
stream->ops->enable() function actually</div>
<div>enables the OA hardware to capture reports, and if
MI_RPC commands</div>
<div>are submitted before OA hw is enabled, it may hang the
gpu.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br></span>
Do you remember if this is documented anywhere?<br>
I couldn't find anything in the MI_RPC instruction.<span class=""><br>
<br></span></div></blockquote><div>Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Also, we can't change the order of calling these
operations inside</div>
<div>i915_perf_enable_locked() since <span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">gen7_update_oacontrol_<wbr>locked()</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">function depends on </span><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">stream->enabled flag to enable the OA</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">hw unit (i.e. it needs the flag to be true).</span></div>
</div>
</div>
</div>
</blockquote>
<br></span>
We can probably work around that by passing some arguments.<span class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">
</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">To workaround this problem, I introduced a tristate here.</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">If you can suggest some alternate solution to this problem,</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">we can remove this tristate kludge here.</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">
</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">Regards,</span></div>
<div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">Sourab</span></div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="m_-4330140404967524317gmail-"></span></blockquote>
</div>
</div>
</div>
</blockquote>
<br>
<br>
</span></div>
</blockquote></div><br></div></div>