<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 31, 2017 at 3:13 PM, Lionel Landwerlin <span dir="ltr"><<a href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 31/07/17 08:59, Sagar Arun Kamble wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
From: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
<br>
This patch introduces a framework to capture OA counter reports associated<br>
with Render command stream. We can then associate the reports captured<br>
through this mechanism with their corresponding context id's. This can be<br>
further extended to associate any other metadata information with the<br>
corresponding samples (since the association with Render command stream<br>
gives us the ability to capture these information while inserting the<br>
corresponding capture commands into the command stream).<br>
<br>
The OA reports generated in this way are associated with a corresponding<br>
workload, and thus can be used the delimit the workload (i.e. sample the<br>
counters at the workload boundaries), within an ongoing stream of periodic<br>
counter snapshots.<br>
<br>
There may be usecases wherein we need more than periodic OA capture mode<br>
which is supported currently. This mode is primarily used for two usecases:<br>
     - Ability to capture system wide metrics, alongwith the ability to map<br>
       the reports back to individual contexts (particularly for HSW).<br>
     - Ability to inject tags for work, into the reports. This provides<br>
       visibility into the multiple stages of work within single context.<br>
<br>
The userspace will be able to distinguish between the periodic and CS based<br>
OA reports by the virtue of source_info sample field.<br>
<br>
The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA<br>
counters, and is inserted at BB boundaries.<br>
The data thus captured will be stored in a separate buffer, which will<br>
be different from the buffer used otherwise for periodic OA capture mode.<br>
The metadata information pertaining to snapshot is maintained in a list,<br>
which also has offsets into the gem buffer object per captured snapshot.<br>
In order to track whether the gpu has completed processing the node,<br>
a field pertaining to corresponding gem request is added, which is tracked<br>
for completion of the command.<br>
<br>
Both periodic and CS based reports are associated with a single stream<br>
(corresponding to render engine), and it is expected to have the samples<br>
in the sequential order according to their timestamps. Now, since these<br>
reports are collected in separate buffers, these are merge sorted at the<br>
time of forwarding to userspace during the read call.<br>
<br>
v2: Aligning with the non-perf interface (custom drm ioctl based). Also,<br>
few related patches are squashed together for better readability<br>
<br>
v3: Updated perf sample capture emit hook name. Reserving space upfront<br>
in the ring for emitting sample capture commands and using<br>
req->fence.seqno for tracking samples. Added SRCU protection for streams.<br>
Changed the stream last_request tracking to resv object. (Chris)<br>
Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved<br>
stream to global per-engine structure. (Sagar)<br>
Update unpin and put in the free routines to i915_vma_unpin_and_release.<br>
Making use of perf stream cs_buffer vma resv instead of separate resv obj.<br>
Pruned perf stream vma resv during gem_idle. (Chris)<br>
Changed payload field ctx_id to u64 to keep all sample data aligned at 8<br>
bytes. (Lionel)<br>
stall/flush prior to sample capture is not added. Do we need to give this<br>
control to user to select whether to stall/flush at each sample?<br>
<br>
Signed-off-by: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org" target="_blank">robert@sixbynine.org</a>><br>
Signed-off-by: Sagar Arun Kamble <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>><br>
---<br>
  drivers/gpu/drm/i915/i915_drv.<wbr>h            |  101 ++-<br>
  drivers/gpu/drm/i915/i915_gem.<wbr>c            |    1 +<br>
  drivers/gpu/drm/i915/i915_gem_<wbr>execbuffer.c |    8 +<br>
  drivers/gpu/drm/i915/i915_perf<wbr>.c           | 1185 ++++++++++++++++++++++------<br>
  drivers/gpu/drm/i915/intel_eng<wbr>ine_cs.c     |    4 +<br>
  drivers/gpu/drm/i915/intel_rin<wbr>gbuffer.c    |    2 +<br>
  drivers/gpu/drm/i915/intel_rin<wbr>gbuffer.h    |    5 +<br>
  include/uapi/drm/i915_drm.h                |   15 +<br>
  8 files changed, 1073 insertions(+), 248 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/i915/i915_dr<wbr>v.h b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
index 2c7456f..8b1cecf 100644<br>
--- a/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
+++ b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
@@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {<br>
         * The stream will always be disabled before this is called.<br>
         */<br>
        void (*destroy)(struct i915_perf_stream *stream);<br>
+<br>
+       /*<br>
+        * @emit_sample_capture: Emit the commands in the command streamer<br>
+        * for a particular gpu engine.<br>
+        *<br>
+        * The commands are inserted to capture the perf sample data at<br>
+        * specific points during workload execution, such as before and after<br>
+        * the batch buffer.<br>
+        */<br>
+       void (*emit_sample_capture)(struct i915_perf_stream *stream,<br>
+                                   struct drm_i915_gem_request *request,<br>
+                                   bool preallocate);<br>
+};<br>
+<br>
+enum i915_perf_stream_state {<br>
+       I915_PERF_STREAM_DISABLED,<br>
+       I915_PERF_STREAM_ENABLE_IN_PR<wbr>OGRESS,<br>
+       I915_PERF_STREAM_ENABLED,<br>
  };<br>
    /**<br>
@@ -1997,9 +2015,9 @@ struct i915_perf_stream {<br>
        struct drm_i915_private *dev_priv;<br>
        /**<br>
-        * @link: Links the stream into ``&drm_i915_private->streams``<br>
+        * @engine: Engine to which this stream corresponds.<br>
         */<br>
-       struct list_head link;<br>
+       struct intel_engine_cs *engine;<br>
        /**<br>
         * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`<br>
@@ -2022,17 +2040,41 @@ struct i915_perf_stream {<br>
        struct i915_gem_context *ctx;<br>
        /**<br>
-        * @enabled: Whether the stream is currently enabled, considering<br>
-        * whether the stream was opened in a disabled state and based<br>
-        * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.<br>
+        * @state: Current stream state, which can be either disabled, enabled,<br>
+        * or enable_in_progress, while considering whether the stream was<br>
+        * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and<br>
+        * `I915_PERF_IOCTL_DISABLE` calls.<br>
         */<br>
-       bool enabled;<br>
+       enum i915_perf_stream_state state;<br>
+<br>
+       /**<br>
+        * @cs_mode: Whether command stream based perf sample collection is<br>
+        * enabled for this stream<br>
+        */<br>
+       bool cs_mode;<br>
+<br>
+       /**<br>
+        * @using_oa: Whether OA unit is in use for this particular stream<br>
+        */<br>
+       bool using_oa;<br>
        /**<br>
         * @ops: The callbacks providing the implementation of this specific<br>
         * type of configured stream.<br>
         */<br>
        const struct i915_perf_stream_ops *ops;<br>
+<br>
+       /* Command stream based perf data buffer */<br>
+       struct {<br>
+               struct i915_vma *vma;<br>
+               u8 *vaddr;<br>
+       } cs_buffer;<br>
+<br>
+       struct list_head cs_samples;<br>
+       spinlock_t cs_samples_lock;<br>
+<br>
+       wait_queue_head_t poll_wq;<br>
+       bool pollin;<br>
  };<br>
    /**<br>
@@ -2095,7 +2137,8 @@ struct i915_oa_ops {<br>
        int (*read)(struct i915_perf_stream *stream,<br>
                    char __user *buf,<br>
                    size_t count,<br>
-                   size_t *offset);<br>
+                   size_t *offset,<br>
+                   u32 ts);<br>
        /**<br>
         * @oa_hw_tail_read: read the OA tail pointer register<br>
@@ -2107,6 +2150,36 @@ struct i915_oa_ops {<br>
        u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);<br>
  };<br>
  +/*<br>
+ * i915_perf_cs_sample - Sample element to hold info about a single perf<br>
+ * sample data associated with a particular GPU command stream.<br>
+ */<br>
+struct i915_perf_cs_sample {<br>
+       /**<br>
+        * @link: Links the sample into ``&stream->cs_samples``<br>
+        */<br>
+       struct list_head link;<br>
+<br>
+       /**<br>
+        * @request: GEM request associated with the sample. The commands to<br>
+        * capture the perf metrics are inserted into the command streamer in<br>
+        * context of this request.<br>
+        */<br>
+       struct drm_i915_gem_request *request;<br>
+<br>
+       /**<br>
+        * @offset: Offset into ``&stream->cs_buffer``<br>
+        * where the perf metrics will be collected, when the commands inserted<br>
+        * into the command stream are executed by GPU.<br>
+        */<br>
+       u32 offset;<br>
+<br>
+       /**<br>
+        * @ctx_id: Context ID associated with this perf sample<br>
+        */<br>
+       u32 ctx_id;<br>
+};<br>
+<br>
  struct intel_cdclk_state {<br>
        unsigned int cdclk, vco, ref;<br>
  };<br>
@@ -2431,17 +2504,10 @@ struct drm_i915_private {<br>
                struct ctl_table_header *sysctl_header;<br>
                struct mutex lock;<br>
-               struct list_head streams;<br>
-<br>
-               struct {<br>
-                       struct i915_perf_stream *exclusive_stream;<br>
  -                     u32 specific_ctx_id;<br>
-<br>
-                       struct hrtimer poll_check_timer;<br>
-                       wait_queue_head_t poll_wq;<br>
-                       bool pollin;<br>
+               struct hrtimer poll_check_timer;<br>
  +             struct {<br>
                        /**<br>
                         * For rate limiting any notifications of spurious<br>
                         * invalid OA reports<br>
@@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,<br>
  void i915_oa_init_reg_state(struct intel_engine_cs *engine,<br>
                            struct i915_gem_context *ctx,<br>
                            uint32_t *reg_state);<br>
+void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *req,<br>
+                                  bool preallocate);<br>
    /* i915_gem_evict.c */<br>
  int __must_check i915_gem_evict_something(struc<wbr>t i915_address_space *vm,<br>
@@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,<br>
  /* i915_perf.c */<br>
  extern void i915_perf_init(struct drm_i915_private *dev_priv);<br>
  extern void i915_perf_fini(struct drm_i915_private *dev_priv);<br>
+extern void i915_perf_streams_mark_idle(st<wbr>ruct drm_i915_private *dev_priv);<br>
  extern void i915_perf_register(struct drm_i915_private *dev_priv);<br>
  extern void i915_perf_unregister(struct drm_i915_private *dev_priv);<br>
  diff --git a/drivers/gpu/drm/i915/i915_ge<wbr>m.c b/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
index 000a764..7b01548 100644<br>
--- a/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
+++ b/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
@@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)<br>
        intel_engines_mark_idle(dev_pr<wbr>iv);<br>
        i915_gem_timelines_mark_idle(d<wbr>ev_priv);<br>
+       i915_perf_streams_mark_idle(d<wbr>ev_priv);<br>
        GEM_BUG_ON(!dev_priv->gt.awake<wbr>);<br>
        dev_priv->gt.awake = false;<br>
diff --git a/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c b/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
index 5fa4476..bfe546b 100644<br>
--- a/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
+++ b/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
@@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,<br>
        if (err)<br>
                goto err_request;<br>
  +     i915_perf_emit_sample_<wbr>capture(rq, true);<br>
+<br>
        err = eb->engine->emit_bb_start(rq,<br>
                                        batch->node.start, PAGE_SIZE,<br>
                                        cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);<br>
        if (err)<br>
                goto err_request;<br>
  +     i915_perf_emit_sample_<wbr>capture(rq, false);<br>
+<br>
        GEM_BUG_ON(!reservation_object<wbr>_test_signaled_rcu(batch-><wbr>resv, true));<br>
        i915_vma_move_to_active(batch, rq, 0);<br>
        reservation_object_lock(batch-<wbr>>resv, NULL);<br>
@@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
                        return err;<br>
        }<br>
  +     i915_perf_emit_sample_<wbr>capture(eb->request, true);<br>
+<br>
        err = eb->engine->emit_bb_start(eb-><wbr>request,<br>
                                        eb->batch->node.start +<br>
                                        eb->batch_start_offset,<br>
@@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
        if (err)<br>
                return err;<br>
  +     i915_perf_emit_sample_<wbr>capture(eb->request, false);<br>
+<br>
        return 0;<br>
  }<br>
  diff --git a/drivers/gpu/drm/i915/i915_pe<wbr>rf.c b/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
index b272653..57e1936 100644<br>
--- a/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
+++ b/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
@@ -193,6 +193,7 @@<br>
    #include <linux/anon_inodes.h><br>
  #include <linux/sizes.h><br>
+#include <linux/srcu.h><br>
    #include "i915_drv.h"<br>
  #include "i915_oa_hsw.h"<br>
@@ -288,6 +289,12 @@<br>
  #define OAREPORT_REASON_CTX_SWITCH     (1<<3)<br>
  #define OAREPORT_REASON_CLK_RATIO      (1<<5)<br>
  +/* Data common to periodic and RCS based OA samples */<br>
+struct i915_perf_sample_data {<br>
+       u64 source;<br>
+       u64 ctx_id;<br>
+       const u8 *report;<br>
+};<br>
    /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate<br>
   *<br>
@@ -328,8 +335,19 @@<br>
        [I915_OA_FORMAT_C4_B8]              = { 7, 64 },<br>
  };<br>
  +/* Duplicated from similar static enum in i915_gem_execbuffer.c */<br>
+#define I915_USER_RINGS (4)<br>
+static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {<br>
+       [I915_EXEC_DEFAULT]     = RCS,<br>
+       [I915_EXEC_RENDER]      = RCS,<br>
+       [I915_EXEC_BLT]         = BCS,<br>
+       [I915_EXEC_BSD]         = VCS,<br>
+       [I915_EXEC_VEBOX]       = VECS<br>
+};<br>
+<br>
  #define SAMPLE_OA_REPORT      (1<<0)<br>
  #define SAMPLE_OA_SOURCE      (1<<1)<br>
+#define SAMPLE_CTX_ID        (1<<2)<br>
    /**<br>
   * struct perf_open_properties - for validated properties given to open a stream<br>
@@ -340,6 +358,9 @@<br>
   * @oa_format: An OA unit HW report format<br>
   * @oa_periodic: Whether to enable periodic OA unit sampling<br>
   * @oa_period_exponent: The OA unit sampling period is derived from this<br>
+ * @cs_mode: Whether the stream is configured to enable collection of metrics<br>
+ * associated with command stream of a particular GPU engine<br>
+ * @engine: The GPU engine associated with the stream in case cs_mode is enabled<br>
   *<br>
   * As read_properties_unlocked() enumerates and validates the properties given<br>
   * to open a stream of metrics the configuration is built up in the structure<br>
@@ -356,6 +377,10 @@ struct perf_open_properties {<br>
        int oa_format;<br>
        bool oa_periodic;<br>
        int oa_period_exponent;<br>
+<br>
+       /* Command stream mode */<br>
+       bool cs_mode;<br>
+       enum intel_engine_id engine;<br>
  };<br>
    static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
@@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
  }<br>
    /**<br>
+ * i915_perf_emit_sample_capture - Insert the commands to capture metrics into<br>
+ * the command stream of a GPU engine.<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ *<br>
+ * The function provides a hook through which the commands to capture perf<br>
+ * metrics, are inserted into the command stream of a GPU engine.<br>
+ */<br>
+void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *request,<br>
+                                  bool preallocate)<br>
+{<br>
+       struct intel_engine_cs *engine = request->engine;<br>
+       struct drm_i915_private *dev_priv = engine->i915;<br>
+       struct i915_perf_stream *stream;<br>
+       int idx;<br>
+<br>
+       if (!dev_priv->perf.initialized)<br>
+               return;<br>
+<br>
+       idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+       stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+       if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&<br>
+                               stream->cs_mode)<br>
+               stream->ops->emit_sample_capt<wbr>ure(stream, request,<br>
+                                                preallocate);<br>
+       srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
+}<br>
+<br>
+/**<br>
+ * release_perf_samples - Release old perf samples to make space for new<br>
+ * sample data.<br>
+ * @stream: Stream from which space is to be freed up.<br>
+ * @target_size: Space required to be freed up.<br>
+ *<br>
+ * We also dereference the associated request before deleting the sample.<br>
+ * Also, no need to check whether the commands associated with old samples<br>
+ * have been completed. This is because these sample entries are anyways going<br>
+ * to be replaced by a new sample, and gpu will eventually overwrite the buffer<br>
+ * contents, when the request associated with new sample completes.<br>
+ */<br>
+static void release_perf_samples(struct i915_perf_stream *stream,<br>
+                                u32 target_size)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+       struct i915_perf_cs_sample *sample, *next;<br>
+       u32 sample_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
+       u32 size = 0;<br>
+<br>
+       list_for_each_entry_safe<br>
+               (sample, next, &stream->cs_samples, link) {<br>
+               size += sample_size;<br>
+               i915_gem_request_put(sample-><wbr>request);<br>
+               list_del(&sample->link);<br>
+               kfree(sample);<br>
+<br>
+               if (size >= target_size)<br>
+                       break;<br>
+       }<br>
+}<br>
+<br>
+/**<br>
+ * insert_perf_sample - Insert a perf sample entry to the sample list.<br>
+ * @stream: Stream into which sample is to be inserted.<br>
+ * @sample: perf CS sample to be inserted into the list<br>
+ *<br>
+ * This function never fails, since it always manages to insert the sample.<br>
+ * If the space is exhausted in the buffer, it will remove the older<br>
+ * entries in order to make space.<br>
+ */<br>
+static void insert_perf_sample(struct i915_perf_stream *stream,<br>
+                               struct i915_perf_cs_sample *sample)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+       struct i915_perf_cs_sample *first, *last;<br>
+       int max_offset = stream->cs_buffer.vma->obj->ba<wbr>se.size;<br>
+       u32 sample_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
+       unsigned long flags;<br>
+<br>
+       spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+       if (list_empty(&stream->cs_sample<wbr>s)) {<br>
+               sample->offset = 0;<br>
+               list_add_tail(&sample->link, &stream->cs_samples);<br>
+               spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+               return;<br>
+       }<br>
+<br>
+       first = list_first_entry(&stream->cs_s<wbr>amples, typeof(*first),<br>
+                               link);<br>
+       last = list_last_entry(&stream->cs_sa<wbr>mples, typeof(*last),<br>
+                               link);<br>
+<br>
+       if (last->offset >= first->offset) {<br>
+               /* Sufficient space available at the end of buffer? */<br>
+               if (last->offset + 2*sample_size < max_offset)<br>
+                       sample->offset = last->offset + sample_size;<br>
+               /*<br>
+                * Wraparound condition. Is sufficient space available at<br>
+                * beginning of buffer?<br>
+                */<br>
+               else if (sample_size < first->offset)<br>
+                       sample->offset = 0;<br>
+               /* Insufficient space. Overwrite existing old entries */<br>
+               else {<br>
+                       u32 target_size = sample_size - first->offset;<br>
+<br>
+                       release_perf_samples(stream, target_size);<br>
+                       sample->offset = 0;<br>
+               }<br>
+       } else {<br>
+               /* Sufficient space available? */<br>
+               if (last->offset + 2*sample_size < first->offset)<br>
+                       sample->offset = last->offset + sample_size;<br>
+               /* Insufficient space. Overwrite existing old entries */<br>
+               else {<br>
+                       u32 target_size = sample_size -<br>
+                               (first->offset - last->offset -<br>
+                               sample_size);<br>
+<br>
+                       release_perf_samples(stream, target_size);<br>
+                       sample->offset = last->offset + sample_size;<br>
+               }<br>
+       }<br>
+       list_add_tail(&sample->link, &stream->cs_samples);<br>
+       spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+}<br>
+<br>
+/**<br>
+ * i915_emit_oa_report_capture - Insert the commands to capture OA<br>
+ * reports metrics into the render command stream<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ * @offset: command stream buffer offset where the OA metrics need to be<br>
+ * collected<br>
+ */<br>
+static int i915_emit_oa_report_capture(<br>
+                               struct drm_i915_gem_request *request,<br>
+                               bool preallocate,<br>
+                               u32 offset)<br>
+{<br>
+       struct drm_i915_private *dev_priv = request->i915;<br>
+       struct intel_engine_cs *engine = request->engine;<br>
+       struct i915_perf_stream *stream;<br>
+       u32 addr = 0;<br>
+       u32 cmd, len = 4, *cs;<br>
+       int idx;<br>
+<br>
+       idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+       stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+       addr = stream->cs_buffer.vma-><a href="http://node.st">node.st</a><wbr>art + offset;<br>
+       srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
+<br>
+       if (WARN_ON(addr & 0x3f)) {<br>
+               DRM_ERROR("OA buffer address not aligned to 64 byte\n");<br>
+               return -EINVAL;<br>
+       }<br>
+<br>
+       if (preallocate)<br>
+               request->reserved_space += len;<br>
+       else<br>
+               request->reserved_space -= len;<br>
+<br>
+       cs = intel_ring_begin(request, 4);<br>
+       if (IS_ERR(cs))<br>
+               return PTR_ERR(cs);<br>
+<br>
+       cmd = MI_REPORT_PERF_COUNT | (1<<0);<br>
+       if (INTEL_GEN(dev_priv) >= 8)<br>
+               cmd |= (2<<0);<br>
+<br>
+       *cs++ = cmd;<br>
+       *cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;<br>
+       *cs++ = request->fence.seqno;<br>
+<br>
+       if (INTEL_GEN(dev_priv) >= 8)<br>
+               *cs++ = 0;<br>
+       else<br>
+               *cs++ = MI_NOOP;<br>
+<br>
+       intel_ring_advance(request, cs);<br>
+<br>
+       return 0;<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_emit_sample_c<wbr>apture - Insert the commands to capture perf<br>
+ * metrics into the GPU command stream<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ */<br>
+static void i915_perf_stream_emit_sample_c<wbr>apture(<br>
+                                       struct i915_perf_stream *stream,<br>
+                                       struct drm_i915_gem_request *request,<br>
+                                       bool preallocate)<br>
+{<br>
+       struct reservation_object *resv = stream->cs_buffer.vma->resv;<br>
+       struct i915_perf_cs_sample *sample;<br>
+       unsigned long flags;<br>
+       int ret;<br>
+<br>
+       sample = kzalloc(sizeof(*sample), GFP_KERNEL);<br>
+       if (sample == NULL) {<br>
+               DRM_ERROR("Perf sample alloc failed\n");<br>
+               return;<br>
+       }<br>
+<br>
+       sample->request = i915_gem_request_get(request);<br>
+       sample->ctx_id = request->ctx->hw_id;<br>
+<br>
+       insert_perf_sample(stream, sample);<br>
+<br>
+       if (stream->sample_flags & SAMPLE_OA_REPORT) {<br>
+               ret = i915_emit_oa_report_capture(re<wbr>quest,<br>
+                                                 preallocate,<br>
+                                                 sample->offset);<br>
+               if (ret)<br>
+                       goto err_unref;<br>
+       }<br>
+<br>
+       reservation_object_lock(resv, NULL);<br>
+       if (reservation_object_reserve_sh<wbr>ared(resv) == 0)<br>
+               reservation_object_add_<wbr>shared_fence(resv, &request->fence);<br>
+       reservation_object_unlock(res<wbr>v);<br>
+<br>
+       i915_vma_move_to_active(strea<wbr>m->cs_buffer.vma, request,<br>
+                                       EXEC_OBJECT_WRITE);<br>
+       return;<br>
+<br>
+err_unref:<br>
+       i915_gem_request_put(sample-><wbr>request);<br>
+       spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+       list_del(&sample->link);<br>
+       spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+       kfree(sample);<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_release_sampl<wbr>es - Release the perf command stream samples<br>
+ * @stream: Stream from which sample are to be released.<br>
+ *<br>
+ * Note: The associated requests should be completed before releasing the<br>
+ * references here.<br>
+ */<br>
+static void i915_perf_stream_release_sampl<wbr>es(struct i915_perf_stream *stream)<br>
+{<br>
+       struct i915_perf_cs_sample *entry, *next;<br>
+       unsigned long flags;<br>
+<br>
+       list_for_each_entry_safe<br>
+               (entry, next, &stream->cs_samples, link) {<br>
+               i915_gem_request_put(entry->r<wbr>equest);<br>
+<br>
+               spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+               list_del(&entry->link);<br>
+               spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+               kfree(entry);<br>
+       }<br>
+}<br>
+<br>
+/**<br>
   * oa_buffer_check_unlocked - check for data and update tail ptr state<br>
   * @dev_priv: i915 device instance<br>
   *<br>
@@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
  }<br>
    /**<br>
- * append_oa_sample - Copies single OA report into userspace read() buffer.<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * append_perf_sample - Copies single perf sample into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf samples<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
- * @report: A single OA report to (optionally) include as part of the sample<br>
+ * @data: perf sample data which contains (optionally) metrics configured<br>
+ * earlier when opening a stream<br>
   *<br>
   * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`<br>
   * properties when opening a stream, tracked as `stream->sample_flags`. This<br>
@@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
   *<br>
   * Returns: 0 on success, negative error code on failure.<br>
   */<br>
-static int append_oa_sample(struct i915_perf_stream *stream,<br>
+static int append_perf_sample(struct i915_perf_stream *stream,<br>
                            char __user *buf,<br>
                            size_t count,<br>
                            size_t *offset,<br>
-                           const u8 *report)<br>
+                           const struct i915_perf_sample_data *data)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
         * transition. These are considered as source 'OABUFFER'.<br>
         */<br>
        if (sample_flags & SAMPLE_OA_SOURCE) {<br>
-               u64 source = I915_PERF_SAMPLE_OA_SOURCE_OAB<wbr>UFFER;<br>
+               if (copy_to_user(buf, &data->source, 8))<br>
+                       return -EFAULT;<br>
+               buf += 8;<br>
+       }<br>
  -             if (copy_to_user(buf, &source, 8))<br>
+       if (sample_flags & SAMPLE_CTX_ID) {<br>
+               if (copy_to_user(buf, &data->ctx_id, 8))<br>
                        return -EFAULT;<br>
                buf += 8;<br>
        }<br>
        if (sample_flags & SAMPLE_OA_REPORT) {<br>
-               if (copy_to_user(buf, report, report_size))<br>
+               if (copy_to_user(buf, data->report, report_size))<br>
                        return -EFAULT;<br>
+               buf += report_size;<br>
        }<br>
        (*offset) += header.size;<br>
@@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
  }<br>
    /**<br>
+ * append_oa_buffer_sample - Copies single periodic OA report into userspace<br>
+ * read() buffer.<br>
+ * @stream: An i915-perf stream opened for OA metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ * @report: A single OA report to (optionally) include as part of the sample<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_oa_buffer_sample(struct i915_perf_stream *stream,<br>
+                               char __user *buf, size_t count,<br>
+                               size_t *offset, const u8 *report)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+       u32 sample_flags = stream->sample_flags;<br>
+       struct i915_perf_sample_data data = { 0 };<br>
+       u32 *report32 = (u32 *)report;<br>
+<br>
+       if (sample_flags & SAMPLE_OA_SOURCE)<br>
+               data.source = I915_PERF_SAMPLE_OA_SOURCE_OAB<wbr>UFFER;<br>
+<br>
+       if (sample_flags & SAMPLE_CTX_ID) {<br>
+               if (INTEL_INFO(dev_priv)->gen < 8)<br>
+                       data.ctx_id = 0;<br>
+               else {<br>
+                       /*<br>
+                        * XXX: Just keep the lower 21 bits for now since I'm<br>
+                        * not entirely sure if the HW touches any of the higher<br>
+                        * bits in this field<br>
+                        */<br>
+                       data.ctx_id = report32[2] & 0x1fffff;<br>
+               }<br>
+       }<br>
+<br>
+       if (sample_flags & SAMPLE_OA_REPORT)<br>
+               data.report = report;<br>
+<br>
+       return append_perf_sample(stream, buf, count, offset, &data);<br>
+}<br>
+<br>
+/**<br>
   * Copies all buffered OA reports into userspace read() buffer.<br>
   * @stream: An i915-perf stream opened for OA metrics<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
   *<br>
   * Notably any error condition resulting in a short read (-%ENOSPC or<br>
   * -%EFAULT) will be returned even though one or more records may<br>
@@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
  static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
                                  char __user *buf,<br>
                                  size_t count,<br>
-                                 size_t *offset)<br>
+                                 size_t *offset,<br>
+                                 u32 ts)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
        u32 taken;<br>
        int ret = 0;<br>
  -     if (WARN_ON(!stream->enabled))<br>
+       if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
                return -EIO;<br>
        spin_lock_irqsave(&dev_priv->p<wbr>erf.oa.oa_buffer.ptr_lock, flags);<br>
@@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
                u32 *report32 = (void *)report;<br>
                u32 ctx_id;<br>
                u32 reason;<br>
+               u32 report_ts = report32[1];<br>
+<br>
+               /* Report timestamp should not exceed the given ts */<br>
+               if (report_ts > ts)<br>
+                       break;<br>
                /*<br>
                 * All the report sizes factor neatly into the buffer<br>
@@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
                 * switches since it's not-uncommon for periodic samples to<br>
                 * identify a switch before any 'context switch' report.<br>
                 */<br>
-               if (!dev_priv->perf.oa.exclusive_<wbr>stream->ctx ||<br>
-                   dev_priv->perf.oa.specific_ct<wbr>x_id == ctx_id ||<br>
+               if (!stream->ctx ||<br>
+                   stream->engine->specific_ctx_<wbr>id == ctx_id ||<br>
                    (dev_priv->perf.oa.oa_buffer.l<wbr>ast_ctx_id ==<br>
-                    dev_priv->perf.oa.specific_ctx<wbr>_id) ||<br>
+                    stream->engine->specific_ctx_i<wbr>d) ||<br>
                    reason & OAREPORT_REASON_CTX_SWITCH) {<br>
                        /*<br>
                         * While filtering for a single context we avoid<br>
                         * leaking the IDs of other contexts.<br>
                         */<br>
-                       if (dev_priv->perf.oa.exclusive_s<wbr>tream->ctx &&<br>
-                           dev_priv->perf.oa.specific_ct<wbr>x_id != ctx_id) {<br>
+                       if (stream->ctx &&<br>
+                           stream->engine->specific_ctx_<wbr>id != ctx_id) {<br>
                                report32[2] = INVALID_CTX_ID;<br>
                        }<br>
  -                     ret = append_oa_sample(stream, buf, count, offset,<br>
-                                              report);<br>
+                       ret = append_oa_buffer_sample(stream<wbr>, buf, count,<br>
+                                                     offset, report);<br>
                        if (ret)<br>
                                break;<br>
  @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
   *<br>
   * Checks OA unit status registers and if necessary appends corresponding<br>
   * status records for userspace (such as for a buffer full condition) and then<br>
@@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
  static int gen8_oa_read(struct i915_perf_stream *stream,<br>
                        char __user *buf,<br>
                        size_t count,<br>
-                       size_t *offset)<br>
+                       size_t *offset,<br>
+                       u32 ts)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        u32 oastatus;<br>
@@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
                           oastatus & ~GEN8_OASTATUS_REPORT_LOST);<br>
        }<br>
  -     return gen8_append_oa_reports(stream, buf, count, offset);<br>
+       return gen8_append_oa_reports(stream, buf, count, offset, ts);<br>
  }<br>
    /**<br>
@@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
   *<br>
   * Notably any error condition resulting in a short read (-%ENOSPC or<br>
   * -%EFAULT) will be returned even though one or more records may<br>
@@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
  static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
                                  char __user *buf,<br>
                                  size_t count,<br>
-                                 size_t *offset)<br>
+                                 size_t *offset,<br>
+                                 u32 ts)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
        u32 taken;<br>
        int ret = 0;<br>
  -     if (WARN_ON(!stream->enabled))<br>
+       if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
                return -EIO;<br>
        spin_lock_irqsave(&dev_priv->p<wbr>erf.oa.oa_buffer.ptr_lock, flags);<br>
@@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
                        continue;<br>
                }<br>
  -             ret = append_oa_sample(stream, buf, count, offset, report);<br>
+               /* Report timestamp should not exceed the given ts */<br>
+               if (report32[1] > ts)<br>
+                       break;<br>
+<br>
+               ret = append_oa_buffer_sample(stream<wbr>, buf, count, offset,<br>
+                                             report);<br>
                if (ret)<br>
                        break;<br>
  @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
   *<br>
   * Checks Gen 7 specific OA unit status registers and if necessary appends<br>
   * corresponding status records for userspace (such as for a buffer full<br>
@@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
  static int gen7_oa_read(struct i915_perf_stream *stream,<br>
                        char __user *buf,<br>
                        size_t count,<br>
-                       size_t *offset)<br>
+                       size_t *offset,<br>
+                       u32 ts)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        u32 oastatus1;<br>
@@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
                        GEN7_OASTATUS1_REPORT_LOST;<br>
        }<br>
  -     return gen7_append_oa_reports(stream, buf, count, offset);<br>
+       return gen7_append_oa_reports(stream, buf, count, offset, ts);<br>
+}<br>
+<br>
+/**<br>
+ * append_cs_buffer_sample - Copies single perf sample data associated with<br>
+ * GPU command stream, into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf CS metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ * @node: Sample data associated with perf metrics<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_cs_buffer_sample(struct i915_perf_stream *stream,<br>
+                               char __user *buf,<br>
+                               size_t count,<br>
+                               size_t *offset,<br>
+                               struct i915_perf_cs_sample *node)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+       struct i915_perf_sample_data data = { 0 };<br>
+       u32 sample_flags = stream->sample_flags;<br>
+       int ret = 0;<br>
+<br>
+       if (sample_flags & SAMPLE_OA_REPORT) {<br>
+               const u8 *report = stream->cs_buffer.vaddr + node->offset;<br>
+               u32 sample_ts = *(u32 *)(report + 4);<br>
+<br>
+               data.report = report;<br>
+<br>
+               /* First, append the periodic OA samples having lower<br>
+                * timestamp values<br>
+                */<br>
+               ret = dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset,<br>
+                                                sample_ts);<br>
+               if (ret)<br>
+                       return ret;<br>
+       }<br>
+<br>
+       if (sample_flags & SAMPLE_OA_SOURCE)<br>
+               data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;<br>
+<br>
+       if (sample_flags & SAMPLE_CTX_ID)<br>
+               data.ctx_id = node->ctx_id;<br>
+<br>
+       return append_perf_sample(stream, buf, count, offset, &data);<br>
  }<br>
    /**<br>
- * i915_oa_wait_unlocked - handles blocking IO until OA data available<br>
+ * append_cs_buffer_samples: Copies all command stream based perf samples<br>
+ * into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf CS metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ *<br>
+ * Notably any error condition resulting in a short read (-%ENOSPC or<br>
+ * -%EFAULT) will be returned even though one or more records may<br>
+ * have been successfully copied. In this case it's up to the caller<br>
+ * to decide if the error should be squashed before returning to<br>
+ * userspace.<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_cs_buffer_samples(struc<wbr>t i915_perf_stream *stream,<br>
+                               char __user *buf,<br>
+                               size_t count,<br>
+                               size_t *offset)<br>
+{<br>
+       struct i915_perf_cs_sample *entry, *next;<br>
+       LIST_HEAD(free_list);<br>
+       int ret = 0;<br>
+       unsigned long flags;<br>
+<br>
+       spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+       if (list_empty(&stream->cs_sample<wbr>s)) {<br>
+               spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+               return 0;<br>
+       }<br>
+       list_for_each_entry_safe(entr<wbr>y, next,<br>
+                                &stream->cs_samples, link) {<br>
+               if (!i915_gem_request_completed(e<wbr>ntry->request))<br>
+                       break;<br>
+               list_move_tail(&entry->link, &free_list);<br>
+       }<br>
+       spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+       if (list_empty(&free_list))<br>
+               return 0;<br>
+<br>
+       list_for_each_entry_safe(entr<wbr>y, next, &free_list, link) {<br>
+               ret = append_cs_buffer_sample(stream<wbr>, buf, count, offset,<br>
+                                             entry);<br>
+               if (ret)<br>
+                       break;<br>
+<br>
+               list_del(&entry->link);<br>
+               i915_gem_request_put(entry->r<wbr>equest);<br>
+               kfree(entry);<br>
+       }<br>
+<br>
+       /* Don't discard remaining entries, keep them for next read */<br>
+       spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+       list_splice(&free_list, &stream->cs_samples);<br>
+       spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+       return ret;<br>
+}<br>
+<br>
+/*<br>
+ * cs_buffer_is_empty - Checks whether the command stream buffer<br>
+ * associated with the stream has data available.<br>
   * @stream: An i915-perf stream opened for OA metrics<br>
   *<br>
+ * Returns: true if atleast one request associated with command stream is<br>
+ * completed, else returns false.<br>
+ */<br>
+static bool cs_buffer_is_empty(struct i915_perf_stream *stream)<br>
+<br>
+{<br>
+       struct i915_perf_cs_sample *entry = NULL;<br>
+       struct drm_i915_gem_request *request = NULL;<br>
+       unsigned long flags;<br>
+<br>
+       spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+       entry = list_first_entry_or_null(&stre<wbr>am->cs_samples,<br>
+                       struct i915_perf_cs_sample, link);<br>
+       if (entry)<br>
+               request = entry->request;<br>
+       spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+       if (!entry)<br>
+               return true;<br>
+       else if (!i915_gem_request_completed(r<wbr>equest))<br>
+               return true;<br>
+       else<br>
+               return false;<br>
+}<br>
+<br>
+/**<br>
+ * stream_have_data_unlocked - Checks whether the stream has data available<br>
+ * @stream: An i915-perf stream opened for OA metrics<br>
+ *<br>
+ * For command stream based streams, check if the command stream buffer has<br>
+ * atleast one sample available, if not return false, irrespective of periodic<br>
+ * oa buffer having the data or not.<br>
+ */<br>
+<br>
+static bool stream_have_data_unlocked(stru<wbr>ct i915_perf_stream *stream)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+<br>
+       if (stream->cs_mode)<br>
+               return !cs_buffer_is_empty(stream);<br>
+       else<br>
+               return oa_buffer_check_unlocked(dev_p<wbr>riv);<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_wait_unlocked - handles blocking IO until data available<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
+ *<br>
   * Called when userspace tries to read() from a blocking stream FD opened<br>
- * for OA metrics. It waits until the hrtimer callback finds a non-empty<br>
- * OA buffer and wakes us.<br>
+ * for perf metrics. It waits until the hrtimer callback finds a non-empty<br>
+ * command stream buffer / OA buffer and wakes us.<br>
   *<br>
   * Note: it's acceptable to have this return with some false positives<br>
   * since any subsequent read handling will return -EAGAIN if there isn't<br>
@@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
   *<br>
   * Returns: zero on success or a negative error code<br>
   */<br>
-static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
+static int i915_perf_stream_wait_unlocked<wbr>(struct i915_perf_stream *stream)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
  @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
        if (!dev_priv->perf.oa.periodic)<br>
                return -EIO;<br>
  -     return wait_event_interruptible(dev_p<wbr>riv->perf.oa.poll_wq,<br>
-                                       oa_buffer_check_unlocked(dev_<wbr>priv));<br>
+       if (stream->cs_mode) {<br>
+               long int ret;<br>
+<br>
+               /* Wait for the all sampled requests. */<br>
+               ret = reservation_object_wait_timeou<wbr>t_rcu(<br>
+                                                   stream->cs_buffer.vma->resv,<br>
+                                                   true,<br>
+                                                   true,<br>
+                                                   MAX_SCHEDULE_TIMEOUT);<br>
+               if (unlikely(ret < 0)) {<br>
+                       DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);<br>
+                       return ret;<br>
+               }<br>
+       }<br>
+<br>
+       return wait_event_interruptible(strea<wbr>m->poll_wq,<br>
+                                       stream_have_data_unlocked(str<wbr>eam));<br>
  }<br>
    /**<br>
- * i915_oa_poll_wait - call poll_wait() for an OA stream poll()<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
   * @file: An i915 perf stream file<br>
   * @wait: poll() state table<br>
   *<br>
- * For handling userspace polling on an i915 perf stream opened for OA metrics,<br>
+ * For handling userspace polling on an i915 perf stream opened for metrics,<br>
   * this starts a poll_wait with the wait queue that our hrtimer callback wakes<br>
- * when it sees data ready to read in the circular OA buffer.<br>
+ * when it sees data ready to read either in command stream buffer or in the<br>
+ * circular OA buffer.<br>
   */<br>
-static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
+static void i915_perf_stream_poll_wait(str<wbr>uct i915_perf_stream *stream,<br>
                              struct file *file,<br>
                              poll_table *wait)<br>
  {<br>
-       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
-<br>
-       poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);<br>
+       poll_wait(file, &stream->poll_wq, wait);<br>
  }<br>
    /**<br>
- * i915_oa_read - just calls through to &i915_oa_ops->read<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * i915_perf_stream_read - Reads perf metrics available into userspace read<br>
+ * buffer<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
   * @buf: destination buffer given by userspace<br>
   * @count: the number of bytes userspace wants to read<br>
   * @offset: (inout): the current position for writing into @buf<br>
@@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
   *<br>
   * Returns: zero on success or a negative error code<br>
   */<br>
-static int i915_oa_read(struct i915_perf_stream *stream,<br>
+static int i915_perf_stream_read(struct i915_perf_stream *stream,<br>
                        char __user *buf,<br>
                        size_t count,<br>
                        size_t *offset)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
  -     return dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset);<br>
+<br>
</blockquote>
<br>
Does the following code mean that a perf stream is either in cs_mode or OA mode?<br>
I couldn't see that condition in the function processing the opening parameters.<br>
<br>
The comments in the patch description also says :<br>
<br>
"Both periodic and CS based reports are associated with a single stream"<br>
<br>
The following code seems to contradict that. Can you explain how it works?<br>
<br>
Thanks<br></blockquote><div><br></div><div>Hi Lionel,</div><div><br></div><div>If you look closely, append_cs_buffer_sample() function does merge sorting of </div><div>OA reports from two independent buffers (OA buffer which has the periodic OA</div><div>samples and Command stream buffer for RCS based OA reports). This is done on</div><div>the basis of the report timestamps.</div><div>Therefore, in the code below, if stream->cs_mode is enabled, that means the</div><div>append_cs_buffer_samples() function needs to be called which will take care of</div><div>collating the samples from these two independent buffers and copying to stream's</div><div>buffer in merge sort'ed order. If cs_mode is not enabled, we can simply collect</div><div>samples from periodic OA buffer and forward them to userspace (done via </div><div>perf.oa.ops.read() function).</div><div>Hope this addresses your question.</div><div><br></div><div>Regards,</div><div>Sourab</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+       if (stream->cs_mode)<br>
+               return append_cs_buffer_samples(strea<wbr>m, buf, count, offset);<br>
+       else if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+               return dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset,<br>
+                                               U32_MAX);<br>
+       else<br>
+               return -EINVAL;<br>
  }<br>
    /**<br>
@@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        if (i915.enable_execlists)<br>
-               dev_priv->perf.oa.specific_ct<wbr>x_id = stream->ctx->hw_id;<br>
+               stream->engine->specific_ctx_<wbr>id = stream->ctx->hw_id;<br>
        else {<br>
                struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
                struct intel_ring *ring;<br>
@@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
                 * i915_ggtt_offset() on the fly) considering the difference<br>
                 * with gen8+ and execlists<br>
                 */<br>
-               dev_priv->perf.oa.specific_ct<wbr>x_id =<br>
+               stream->engine->specific_ctx_<wbr>id =<br>
                        i915_ggtt_offset(stream->ctx-><wbr>engine[engine->id].state);<br>
        }<br>
  @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
        if (i915.enable_execlists) {<br>
-               dev_priv->perf.oa.specific_ct<wbr>x_id = INVALID_CTX_ID;<br>
+               stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
        } else {<br>
                struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
                mutex_lock(&dev_priv->drm.stru<wbr>ct_mutex);<br>
  -             dev_priv->perf.oa.specific_ct<wbr>x_id = INVALID_CTX_ID;<br>
+               stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
                engine->context_unpin(engine, stream->ctx);<br>
                mutex_unlock(&dev_priv-><a href="http://drm.st">drm.st</a><wbr>ruct_mutex);<br>
@@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
  }<br>
    static void<br>
+free_cs_buffer(struct i915_perf_stream *stream)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+<br>
+       mutex_lock(&dev_priv->drm.str<wbr>uct_mutex);<br>
+<br>
+       i915_gem_object_unpin_map(str<wbr>eam->cs_buffer.vma->obj);<br>
+       i915_vma_unpin_and_release(&s<wbr>tream->cs_buffer.vma);<br>
+<br>
+       stream->cs_buffer.vma = NULL;<br>
+       stream->cs_buffer.vaddr = NULL;<br>
+<br>
+       mutex_unlock(&dev_priv->drm.s<wbr>truct_mutex);<br>
+}<br>
+<br>
+static void<br>
  free_oa_buffer(struct drm_i915_private *i915)<br>
  {<br>
        mutex_lock(&i915->drm.struct_m<wbr>utex);<br>
        i915_gem_object_unpin_map(i915<wbr>->perf.oa.oa_buffer.vma->obj);<br>
-       i915_vma_unpin(i915->perf.oa.<wbr>oa_buffer.vma);<br>
-       i915_gem_object_put(i915->per<wbr>f.oa.oa_buffer.vma->obj);<br>
+       i915_vma_unpin_and_release(&i<wbr>915->perf.oa.oa_buffer.vma);<br>
        i915->perf.oa.oa_buffer.vma = NULL;<br>
        i915->perf.oa.oa_buffer.vaddr = NULL;<br>
@@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
        mutex_unlock(&i915->drm.struct<wbr>_mutex);<br>
  }<br>
  -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_destroy(struc<wbr>t i915_perf_stream *stream)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
-<br>
-       BUG_ON(stream != dev_priv->perf.oa.exclusive_st<wbr>ream);<br>
+       struct intel_engine_cs *engine = stream->engine;<br>
+       struct i915_perf_stream *engine_stream;<br>
+       int idx;<br>
+<br>
+       idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+       engine_stream = srcu_dereference(engine->exclu<wbr>sive_stream,<br>
+                                        &engine->perf_srcu);<br>
+       if (WARN_ON(stream != engine_stream))<br>
+               return;<br>
+       srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
        /*<br>
         * Unset exclusive_stream first, it might be checked while<br>
         * disabling the metric set on gen8+.<br>
         */<br>
-       dev_priv->perf.oa.exclusive_s<wbr>tream = NULL;<br>
+       rcu_assign_pointer(stream->en<wbr>gine->exclusive_stream, NULL);<br>
+       synchronize_srcu(&stream->eng<wbr>ine->perf_srcu);<br>
  -     dev_priv->perf.oa.ops.<wbr>disable_metric_set(dev_priv);<br>
+       if (stream->using_oa) {<br>
+               dev_priv->perf.oa.ops.<wbr>disable_metric_set(dev_priv);<br>
  -     free_oa_buffer(dev_priv);<br>
+               free_oa_buffer(dev_priv);<br>
  -     intel_uncore_forcewake_put(de<wbr>v_priv, FORCEWAKE_ALL);<br>
-       intel_runtime_pm_put(dev_<wbr>priv);<br>
+               intel_uncore_forcewake_put(de<wbr>v_priv, FORCEWAKE_ALL);<br>
+               intel_runtime_pm_put(dev_<wbr>priv);<br>
  -     if (stream->ctx)<br>
-               oa_put_render_ctx_id(stream);<br>
+               if (stream->ctx)<br>
+                       oa_put_render_ctx_id(stream);<br>
+       }<br>
+<br>
+       if (stream->cs_mode)<br>
+               free_cs_buffer(stream);<br>
        if (dev_priv->perf.oa.spurious_re<wbr>port_rs.missed) {<br>
                DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",<br>
@@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
         * memory...<br>
         */<br>
        memset(dev_priv->perf.oa.oa_bu<wbr>ffer.vaddr, 0, OA_BUFFER_SIZE);<br>
-<br>
-       /* Maybe make ->pollin per-stream state if we support multiple<br>
-        * concurrent streams in the future.<br>
-        */<br>
-       dev_priv->perf.oa.pollin = false;<br>
  }<br>
    static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
@@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
         * memory...<br>
         */<br>
        memset(dev_priv->perf.oa.oa_bu<wbr>ffer.vaddr, 0, OA_BUFFER_SIZE);<br>
-<br>
-       /*<br>
-        * Maybe make ->pollin per-stream state if we support multiple<br>
-        * concurrent streams in the future.<br>
-        */<br>
-       dev_priv->perf.oa.pollin = false;<br>
  }<br>
  -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
+static int alloc_obj(struct drm_i915_private *dev_priv,<br>
+                    struct i915_vma **vma, u8 **vaddr)<br>
  {<br>
        struct drm_i915_gem_object *bo;<br>
-       struct i915_vma *vma;<br>
        int ret;<br>
  -     if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
-               return -ENODEV;<br>
+       intel_runtime_pm_get(dev_<wbr>priv);<br>
        ret = i915_mutex_lock_interruptible(<wbr>&dev_priv->drm);<br>
        if (ret)<br>
-               return ret;<br>
+               goto out;<br>
        BUILD_BUG_ON_NOT_POWER_OF_2(OA<wbr>_BUFFER_SIZE);<br>
        BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);<br>
        bo = i915_gem_object_create(dev_pri<wbr>v, OA_BUFFER_SIZE);<br>
        if (IS_ERR(bo)) {<br>
-               DRM_ERROR("Failed to allocate OA buffer\n");<br>
+               DRM_ERROR("Failed to allocate i915 perf obj\n");<br>
                ret = PTR_ERR(bo);<br>
                goto unlock;<br>
        }<br>
@@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
                goto err_unref;<br>
        /* PreHSW required 512K alignment, HSW requires 16M */<br>
-       vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
-       if (IS_ERR(vma)) {<br>
-               ret = PTR_ERR(vma);<br>
+       *vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
+       if (IS_ERR(*vma)) {<br>
+               ret = PTR_ERR(*vma);<br>
                goto err_unref;<br>
        }<br>
-       dev_priv->perf.oa.oa_buffer.v<wbr>ma = vma;<br>
  -     dev_priv->perf.oa.oa_buffer.v<wbr>addr =<br>
-               i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
-       if (IS_ERR(dev_priv->perf.oa.oa_b<wbr>uffer.vaddr)) {<br>
-               ret = PTR_ERR(dev_priv->perf.oa.oa_b<wbr>uffer.vaddr);<br>
+       *vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
+       if (IS_ERR(*vaddr)) {<br>
+               ret = PTR_ERR(*vaddr);<br>
                goto err_unpin;<br>
        }<br>
  -     dev_priv->perf.oa.ops.init_<wbr>oa_buffer(dev_priv);<br>
-<br>
-       DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",<br>
-                        i915_ggtt_offset(dev_priv->per<wbr>f.oa.oa_buffer.vma),<br>
-                        dev_priv-><a href="http://perf.oa.oa_buffer.va">perf.oa.oa_buffer.va</a><wbr>ddr);<br>
-<br>
        goto unlock;<br>
    err_unpin:<br>
-       __i915_vma_unpin(vma);<br>
+       i915_vma_unpin(*vma);<br>
    err_unref:<br>
        i915_gem_object_put(bo);<br>
  -     dev_priv->perf.oa.oa_buffer.v<wbr>addr = NULL;<br>
-       dev_priv->perf.oa.oa_buffer.v<wbr>ma = NULL;<br>
-<br>
  unlock:<br>
        mutex_unlock(&dev_priv-><a href="http://drm.st">drm.st</a><wbr>ruct_mutex);<br>
+out:<br>
+       intel_runtime_pm_put(dev_<wbr>priv);<br>
        return ret;<br>
  }<br>
  +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
+{<br>
+       struct i915_vma *vma;<br>
+       u8 *vaddr;<br>
+       int ret;<br>
+<br>
+       if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
+               return -ENODEV;<br>
+<br>
+       ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
+       if (ret)<br>
+               return ret;<br>
+<br>
+       dev_priv->perf.oa.oa_buffer.v<wbr>ma = vma;<br>
+       dev_priv->perf.oa.oa_buffer.v<wbr>addr = vaddr;<br>
+<br>
+       dev_priv->perf.oa.ops.init_<wbr>oa_buffer(dev_priv);<br>
+<br>
+       DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",<br>
+                        i915_ggtt_offset(dev_priv->per<wbr>f.oa.oa_buffer.vma),<br>
+                        dev_priv-><a href="http://perf.oa.oa_buffer.va">perf.oa.oa_buffer.va</a><wbr>ddr);<br>
+       return 0;<br>
+}<br>
+<br>
+static int alloc_cs_buffer(struct i915_perf_stream *stream)<br>
+{<br>
+       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+       struct i915_vma *vma;<br>
+       u8 *vaddr;<br>
+       int ret;<br>
+<br>
+       if (WARN_ON(stream->cs_buffer.vma<wbr>))<br>
+               return -ENODEV;<br>
+<br>
+       ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
+       if (ret)<br>
+               return ret;<br>
+<br>
+       stream->cs_buffer.vma = vma;<br>
+       stream->cs_buffer.vaddr = vaddr;<br>
+       if (WARN_ON(!list_empty(&stream-><wbr>cs_samples)))<br>
+               INIT_LIST_HEAD(&stream->cs_sa<wbr>mples);<br>
+<br>
+       DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",<br>
+                        i915_ggtt_offset(stream->cs_bu<wbr>ffer.vma),<br>
+                        stream->cs_buffer.vaddr);<br>
+<br>
+       return 0;<br>
+}<br>
+<br>
  static void config_oa_regs(struct drm_i915_private *dev_priv,<br>
                           const struct i915_oa_reg *regs,<br>
                           int n_regs)<br>
@@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)<br>
    static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
  {<br>
+       struct i915_perf_stream *stream;<br>
+       struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
+       int idx;<br>
+<br>
        /*<br>
         * Reset buf pointers so we don't forward reports from before now.<br>
         *<br>
@@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
         */<br>
        gen7_init_oa_buffer(dev_priv);<br>
  -     if (dev_priv->perf.oa.exclusive_s<wbr>tream->enabled) {<br>
-               struct i915_gem_context *ctx =<br>
-                       dev_priv->perf.oa.exclusive_s<wbr>tream->ctx;<br>
-               u32 ctx_id = dev_priv->perf.oa.specific_ctx<wbr>_id;<br>
-<br>
+       idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+       stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+       if (stream->state != I915_PERF_STREAM_DISABLED) {<br>
+               struct i915_gem_context *ctx = stream->ctx;<br>
+               u32 ctx_id = engine->specific_ctx_id;<br>
                bool periodic = dev_priv->perf.oa.periodic;<br>
                u32 period_exponent = dev_priv->perf.oa.period_expon<wbr>ent;<br>
                u32 report_format = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat;<br>
@@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
                           GEN7_OACONTROL_ENABLE);<br>
        } else<br>
                I915_WRITE(GEN7_OACONTROL, 0);<br>
+       srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
  }<br>
    static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
@@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
  }<br>
    /**<br>
- * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream<br>
- * @stream: An i915 perf stream opened for OA metrics<br>
+ * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream<br>
+ * @stream: An i915 perf stream opened for GPU metrics<br>
   *<br>
   * [Re]enables hardware periodic sampling according to the period configured<br>
   * when opening the stream. This also starts a hrtimer that will periodically<br>
   * check for data in the circular OA buffer for notifying userspace (e.g.<br>
   * during a read() or poll()).<br>
   */<br>
-static void i915_oa_stream_enable(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_enable(struct i915_perf_stream *stream)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
  -     dev_priv->perf.oa.ops.oa_enab<wbr>le(dev_priv);<br>
+       if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+               dev_priv->perf.oa.ops.oa_enab<wbr>le(dev_priv);<br>
  -     if (dev_priv->perf.oa.periodic)<br>
-               hrtimer_start(&dev_priv-><wbr>perf.oa.poll_check_timer,<br>
+       if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
+               hrtimer_start(&dev_priv-><wbr>perf.poll_check_timer,<br>
                              ns_to_ktime(POLL_PERIOD),<br>
                              HRTIMER_MODE_REL_PINNED);<br>
  }<br>
@@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)<br>
  }<br>
    /**<br>
- * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream<br>
- * @stream: An i915 perf stream opened for OA metrics<br>
+ * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream<br>
+ * @stream: An i915 perf stream opened for GPU metrics<br>
   *<br>
   * Stops the OA unit from periodically writing counter reports into the<br>
   * circular OA buffer. This also stops the hrtimer that periodically checks for<br>
   * data in the circular OA buffer, for notifying userspace.<br>
   */<br>
-static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_disable(struc<wbr>t i915_perf_stream *stream)<br>
  {<br>
        struct drm_i915_private *dev_priv = stream->dev_priv;<br>
  -     dev_priv->perf.oa.ops.oa_disa<wbr>ble(dev_priv);<br>
+       if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
+               hrtimer_cancel(&dev_priv->per<wbr>f.poll_check_timer);<br>
+<br>
+       if (stream->cs_mode)<br>
+               i915_perf_stream_release_samp<wbr>les(stream);<br>
  -     if (dev_priv->perf.oa.periodic)<br>
-               hrtimer_cancel(&dev_priv->per<wbr>f.oa.poll_check_timer);<br>
+       if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+               dev_priv->perf.oa.ops.oa_disa<wbr>ble(dev_priv);<br>
  }<br>
  -static const struct i915_perf_stream_ops i915_oa_stream_ops = {<br>
-       .destroy = i915_oa_stream_destroy,<br>
-       .enable = i915_oa_stream_enable,<br>
-       .disable = i915_oa_stream_disable,<br>
-       .wait_unlocked = i915_oa_wait_unlocked,<br>
-       .poll_wait = i915_oa_poll_wait,<br>
-       .read = i915_oa_read,<br>
+static const struct i915_perf_stream_ops perf_stream_ops = {<br>
+       .destroy = i915_perf_stream_destroy,<br>
+       .enable = i915_perf_stream_enable,<br>
+       .disable = i915_perf_stream_disable,<br>
+       .wait_unlocked = i915_perf_stream_wait_unlocked<wbr>,<br>
+       .poll_wait = i915_perf_stream_poll_wait,<br>
+       .read = i915_perf_stream_read,<br>
+       .emit_sample_capture = i915_perf_stream_emit_sample_c<wbr>apture,<br>
  };<br>
    /**<br>
- * i915_oa_stream_init - validate combined props for OA stream and init<br>
+ * i915_perf_stream_init - validate combined props for stream and init<br>
   * @stream: An i915 perf stream<br>
   * @param: The open parameters passed to `DRM_I915_PERF_OPEN`<br>
   * @props: The property state that configures stream (individually validated)<br>
@@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
   * doesn't ensure that the combination necessarily makes sense.<br>
   *<br>
   * At this point it has been determined that userspace wants a stream of</blockquote>
</blockquote></div><br></div></div>