<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <span dir="ltr"><<a href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><div><div class="h5">
    <div class="m_-4330140404967524317moz-cite-prefix">On 01/08/17 19:05, sourab gupta wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Aug 1, 2017 at 2:59 PM,
            Kamble, Sagar A <span dir="ltr"><<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div class="m_-4330140404967524317gmail-HOEnZb">
                <div class="m_-4330140404967524317gmail-h5"><br>
                  <br>
                  -----Original Message-----<br>
                  From: Landwerlin, Lionel G<br>
                  Sent: Monday, July 31, 2017 9:16 PM<br>
                  To: Kamble, Sagar A <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>>;
                  <a href="mailto:intel-gfx@lists.freedesktop.org" target="_blank">intel-gfx@lists.freedesktop.or<wbr>g</a><br>
                  Cc: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
                  Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915:
                  Framework for capturing command stream based OA
                  reports and ctx id info.<br>
                  <br>
                  On 31/07/17 08:59, Sagar Arun Kamble wrote:<br>
                  > From: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
                  ><br>
                  > This patch introduces a framework to capture OA
                  counter reports associated<br>
                  > with Render command stream. We can then associate
                  the reports captured<br>
                  > through this mechanism with their corresponding
                  context id's. This can be<br>
                  > further extended to associate any other metadata
                  information with the<br>
                  > corresponding samples (since the association with
                  Render command stream<br>
                  > gives us the ability to capture these information
                  while inserting the<br>
                  > corresponding capture commands into the command
                  stream).<br>
                  ><br>
                  > The OA reports generated in this way are
                  associated with a corresponding<br>
                  > workload, and thus can be used the delimit the
                  workload (i.e. sample the<br>
                  > counters at the workload boundaries), within an
                  ongoing stream of periodic<br>
                  > counter snapshots.<br>
                  ><br>
                  > There may be usecases wherein we need more than
                  periodic OA capture mode<br>
                  > which is supported currently. This mode is
                  primarily used for two usecases:<br>
                  >      - Ability to capture system wide metrics,
                  alongwith the ability to map<br>
                  >        the reports back to individual contexts
                  (particularly for HSW).<br>
                  >      - Ability to inject tags for work, into the
                  reports. This provides<br>
                  >        visibility into the multiple stages of
                  work within single context.<br>
                  ><br>
                  > The userspace will be able to distinguish between
                  the periodic and CS based<br>
                  > OA reports by the virtue of source_info sample
                  field.<br>
                  ><br>
                  > The command MI_REPORT_PERF_COUNT can be used to
                  capture snapshots of OA<br>
                  > counters, and is inserted at BB boundaries.<br>
                  > The data thus captured will be stored in a
                  separate buffer, which will<br>
                  > be different from the buffer used otherwise for
                  periodic OA capture mode.<br>
                  > The metadata information pertaining to snapshot
                  is maintained in a list,<br>
                  > which also has offsets into the gem buffer object
                  per captured snapshot.<br>
                  > In order to track whether the gpu has completed
                  processing the node,<br>
                  > a field pertaining to corresponding gem request
                  is added, which is tracked<br>
                  > for completion of the command.<br>
                  ><br>
                  > Both periodic and CS based reports are associated
                  with a single stream<br>
                  > (corresponding to render engine), and it is
                  expected to have the samples<br>
                  > in the sequential order according to their
                  timestamps. Now, since these<br>
                  > reports are collected in separate buffers, these
                  are merge sorted at the<br>
                  > time of forwarding to userspace during the read
                  call.<br>
                  ><br>
                  > v2: Aligning with the non-perf interface (custom
                  drm ioctl based). Also,<br>
                  > few related patches are squashed together for
                  better readability<br>
                  ><br>
                  > v3: Updated perf sample capture emit hook name.
                  Reserving space upfront<br>
                  > in the ring for emitting sample capture commands
                  and using<br>
                  > req->fence.seqno for tracking samples. Added
                  SRCU protection for streams.<br>
                  > Changed the stream last_request tracking to resv
                  object. (Chris)<br>
                  > Updated perf.sample_lock spin_lock usage to avoid
                  softlockups. Moved<br>
                  > stream to global per-engine structure. (Sagar)<br>
                  > Update unpin and put in the free routines to
                  i915_vma_unpin_and_release.<br>
                  > Making use of perf stream cs_buffer vma resv
                  instead of separate resv obj.<br>
                  > Pruned perf stream vma resv during gem_idle.
                  (Chris)<br>
                  > Changed payload field ctx_id to u64 to keep all
                  sample data aligned at 8<br>
                  > bytes. (Lionel)<br>
                  > stall/flush prior to sample capture is not added.
                  Do we need to give this<br>
                  > control to user to select whether to stall/flush
                  at each sample?<br>
                  ><br>
                  > Signed-off-by: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
                  > Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org" target="_blank">robert@sixbynine.org</a>><br>
                  > Signed-off-by: Sagar Arun Kamble <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>><br>
                  > ---<br>
                  >   drivers/gpu/drm/i915/i915_<wbr>drv.h           
                  |  101 ++-<br>
                  >   drivers/gpu/drm/i915/i915_<wbr>gem.c           
                  |    1 +<br>
                  >   drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c
                  |    8 +<br>
                  >   drivers/gpu/drm/i915/i915_per<wbr>f.c         
                   | 1185 ++++++++++++++++++++++------<br>
                  >   drivers/gpu/drm/i915/intel_en<wbr>gine_cs.c   
                   |    4 +<br>
                  >   drivers/gpu/drm/i915/intel_ri<wbr>ngbuffer.c   
                  |    2 +<br>
                  >   drivers/gpu/drm/i915/intel_ri<wbr>ngbuffer.h   
                  |    5 +<br>
                  >   include/uapi/drm/i915_drm.h                | 
                   15 +<br>
                  >   8 files changed, 1073 insertions(+), 248
                  deletions(-)<br>
                  ><br>
                  > diff --git a/drivers/gpu/drm/i915/i915_dr<wbr>v.h
                  b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
                  > index 2c7456f..8b1cecf 100644<br>
                  > --- a/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
                  > +++ b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
                  > @@ -1985,6 +1985,24 @@ struct
                  i915_perf_stream_ops {<br>
                  >        * The stream will always be disabled
                  before this is called.<br>
                  >        */<br>
                  >       void (*destroy)(struct i915_perf_stream
                  *stream);<br>
                  > +<br>
                  > +     /*<br>
                  > +      * @emit_sample_capture: Emit the commands
                  in the command streamer<br>
                  > +      * for a particular gpu engine.<br>
                  > +      *<br>
                  > +      * The commands are inserted to capture the
                  perf sample data at<br>
                  > +      * specific points during workload
                  execution, such as before and after<br>
                  > +      * the batch buffer.<br>
                  > +      */<br>
                  > +     void (*emit_sample_capture)(struct
                  i915_perf_stream *stream,<br>
                  > +                                 struct
                  drm_i915_gem_request *request,<br>
                  > +                                 bool
                  preallocate);<br>
                  > +};<br>
                  > +<br>
                  <br>
                  It seems the motivation for this following enum is
                  mostly to deal with<br>
                  the fact that engine->perf_srcu is set before the
                  OA unit is configured.<br>
                  Would it possible to set it later so that we get rid
                  of the enum?<br>
                  <br>
                </div>
              </div>
              <Sagar> I will try to make this as just binary
              state. This enum is defining the state of the stream. I
              too got confused with purpose of IN_PROGRESS.<br>
              SRCU is used for synchronizing stream state check.<br>
              IN_PROGRESS will enable us to not advertently try to
              access the stream vma for inserting the samples, but I
              guess depending on disabled/enabled should<br>
              suffice.<br>
            </blockquote>
            <div><br>
            </div>
            <div>Hi Sagar/Lionel,</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    Hi Sourab,<br>
    <br>
    Thanks again for your input on this.<span class=""><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>The purpose of the tristate was to workaround a
              particular kludge of</div>
            <div>working with just enabled/disabled boolean state. I'll
              explain below.</div>
            <div><br>
            </div>
            <div>Let's say we have only boolean state.</div>
            <div><span style="color:rgb(80,0,80)">i915_perf_emit_sample_capture(<wbr>)
                function would depend on</span></div>
            <div><span style="color:rgb(80,0,80)">stream->enabled in
                order to insert the MI_RPC command in RCS.</span></div>
            <div><span style="color:rgb(80,0,80)">If you see </span>i915_perf_enable_locked(),
              stream->enabled is set before</div>
            <div>stream->ops->enable(). The
              stream->ops->enable() function actually</div>
            <div>enables the OA hardware to capture reports, and if
              MI_RPC commands</div>
            <div>are submitted before OA hw is enabled, it may hang the
              gpu.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    Do you remember if this is documented anywhere?<br>
    I couldn't find anything in the MI_RPC instruction.<span class=""><br>
    <br></span></div></blockquote><div>Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="">
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Also, we can't change the order of calling these
              operations inside</div>
            <div>i915_perf_enable_locked() since <span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">gen7_update_oacontrol_<wbr>locked()</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">function depends on </span><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">stream->enabled flag to enable the OA</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">hw unit (i.e. it needs the flag to be true).</span></div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    We can probably work around that by passing some arguments.<span class=""><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">
</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">To workaround this problem, I introduced a tristate here.</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">If you can suggest some alternate solution to this problem,</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">we can remove this tristate kludge here.</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">
</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">Regards,</span></div>
            <div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre-wrap">Sourab</span></div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <span class="m_-4330140404967524317gmail-"></span></blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
  </span></div>

</blockquote></div><br></div></div>