<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <span dir="ltr"><<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
<br>
-----Original Message-----<br>
From: Landwerlin, Lionel G<br>
Sent: Monday, July 31, 2017 9:16 PM<br>
To: Kamble, Sagar A <<a href="mailto:sagar.a.kamble@intel.com">sagar.a.kamble@intel.com</a>>; <a href="mailto:intel-gfx@lists.freedesktop.org">intel-gfx@lists.freedesktop.<wbr>org</a><br>
Cc: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com">sourab.gupta@intel.com</a>><br>
Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.<br>
<br>
On 31/07/17 08:59, Sagar Arun Kamble wrote:<br>
> From: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com">sourab.gupta@intel.com</a>><br>
><br>
> This patch introduces a framework to capture OA counter reports associated<br>
> with Render command stream. We can then associate the reports captured<br>
> through this mechanism with their corresponding context id's. This can be<br>
> further extended to associate any other metadata information with the<br>
> corresponding samples (since the association with Render command stream<br>
> gives us the ability to capture these information while inserting the<br>
> corresponding capture commands into the command stream).<br>
><br>
> The OA reports generated in this way are associated with a corresponding<br>
> workload, and thus can be used the delimit the workload (i.e. sample the<br>
> counters at the workload boundaries), within an ongoing stream of periodic<br>
> counter snapshots.<br>
><br>
> There may be usecases wherein we need more than periodic OA capture mode<br>
> which is supported currently. This mode is primarily used for two usecases:<br>
>      - Ability to capture system wide metrics, alongwith the ability to map<br>
>        the reports back to individual contexts (particularly for HSW).<br>
>      - Ability to inject tags for work, into the reports. This provides<br>
>        visibility into the multiple stages of work within single context.<br>
><br>
> The userspace will be able to distinguish between the periodic and CS based<br>
> OA reports by the virtue of source_info sample field.<br>
><br>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA<br>
> counters, and is inserted at BB boundaries.<br>
> The data thus captured will be stored in a separate buffer, which will<br>
> be different from the buffer used otherwise for periodic OA capture mode.<br>
> The metadata information pertaining to snapshot is maintained in a list,<br>
> which also has offsets into the gem buffer object per captured snapshot.<br>
> In order to track whether the gpu has completed processing the node,<br>
> a field pertaining to corresponding gem request is added, which is tracked<br>
> for completion of the command.<br>
><br>
> Both periodic and CS based reports are associated with a single stream<br>
> (corresponding to render engine), and it is expected to have the samples<br>
> in the sequential order according to their timestamps. Now, since these<br>
> reports are collected in separate buffers, these are merge sorted at the<br>
> time of forwarding to userspace during the read call.<br>
><br>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,<br>
> few related patches are squashed together for better readability<br>
><br>
> v3: Updated perf sample capture emit hook name. Reserving space upfront<br>
> in the ring for emitting sample capture commands and using<br>
> req->fence.seqno for tracking samples. Added SRCU protection for streams.<br>
> Changed the stream last_request tracking to resv object. (Chris)<br>
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved<br>
> stream to global per-engine structure. (Sagar)<br>
> Update unpin and put in the free routines to i915_vma_unpin_and_release.<br>
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.<br>
> Pruned perf stream vma resv during gem_idle. (Chris)<br>
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8<br>
> bytes. (Lionel)<br>
> stall/flush prior to sample capture is not added. Do we need to give this<br>
> control to user to select whether to stall/flush at each sample?<br>
><br>
> Signed-off-by: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com">sourab.gupta@intel.com</a>><br>
> Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org">robert@sixbynine.org</a>><br>
> Signed-off-by: Sagar Arun Kamble <<a href="mailto:sagar.a.kamble@intel.com">sagar.a.kamble@intel.com</a>><br>
> ---<br>
>   drivers/gpu/drm/i915/i915_drv.<wbr>h            |  101 ++-<br>
>   drivers/gpu/drm/i915/i915_gem.<wbr>c            |    1 +<br>
>   drivers/gpu/drm/i915/i915_gem_<wbr>execbuffer.c |    8 +<br>
>   drivers/gpu/drm/i915/i915_<wbr>perf.c           | 1185 ++++++++++++++++++++++------<br>
>   drivers/gpu/drm/i915/intel_<wbr>engine_cs.c     |    4 +<br>
>   drivers/gpu/drm/i915/intel_<wbr>ringbuffer.c    |    2 +<br>
>   drivers/gpu/drm/i915/intel_<wbr>ringbuffer.h    |    5 +<br>
>   include/uapi/drm/i915_drm.h                |   15 +<br>
>   8 files changed, 1073 insertions(+), 248 deletions(-)<br>
><br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>drv.h b/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> index 2c7456f..8b1cecf 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {<br>
>        * The stream will always be disabled before this is called.<br>
>        */<br>
>       void (*destroy)(struct i915_perf_stream *stream);<br>
> +<br>
> +     /*<br>
> +      * @emit_sample_capture: Emit the commands in the command streamer<br>
> +      * for a particular gpu engine.<br>
> +      *<br>
> +      * The commands are inserted to capture the perf sample data at<br>
> +      * specific points during workload execution, such as before and after<br>
> +      * the batch buffer.<br>
> +      */<br>
> +     void (*emit_sample_capture)(struct i915_perf_stream *stream,<br>
> +                                 struct drm_i915_gem_request *request,<br>
> +                                 bool preallocate);<br>
> +};<br>
> +<br>
<br>
It seems the motivation for this following enum is mostly to deal with<br>
the fact that engine->perf_srcu is set before the OA unit is configured.<br>
Would it possible to set it later so that we get rid of the enum?<br>
<br>
</div></div><Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS.<br>
SRCU is used for synchronizing stream state check.<br>
IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should<br>
suffice.<br></blockquote><div><br></div><div>Hi Sagar/Lionel,</div><div><br></div><div>The purpose of the tristate was to workaround a particular kludge of</div><div>working with just enabled/disabled boolean state. I'll explain below.</div><div><br></div><div>Let's say we have only boolean state.</div><div><span style="color:rgb(80,0,80)">i915_perf_emit_sample_capture() function would depend on</span></div><div><span style="color:rgb(80,0,80)">stream->enabled in order to insert the MI_RPC command in RCS.</span></div><div><span style="color:rgb(80,0,80)">If you see </span>i915_perf_enable_locked(), stream->enabled is set before</div><div>stream->ops->enable(). The stream->ops->enable() function actually</div><div>enables the OA hardware to capture reports, and if MI_RPC commands</div><div>are submitted before OA hw is enabled, it may hang the gpu.<br></div><div><br></div><div>Also, we can't change the order of calling these operations inside</div><div>i915_perf_enable_locked() since <span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">gen7_update_oacontrol_locked()</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">function depends on </span><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">stream->enabled flag to enable the OA</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">hw unit (i.e. it needs the flag to be true).</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre"><br></span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">To workaround this problem, I introduced a tristate here.</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">If you can suggest some alternate solution to this problem,</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">we can remove this tristate kludge here.</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre"><br></span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">Regards,</span></div><div><span style="color:rgb(111,66,193);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:12px;white-space:pre">Sourab</span></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
> +enum i915_perf_stream_state {<br>
> +     I915_PERF_STREAM_DISABLED,<br>
> +     I915_PERF_STREAM_ENABLE_IN_<wbr>PROGRESS,<br>
> +     I915_PERF_STREAM_ENABLED,<br>
>   };<br>
><br>
>   /**<br>
> @@ -1997,9 +2015,9 @@ struct i915_perf_stream {<br>
>       struct drm_i915_private *dev_priv;<br>
><br>
>       /**<br>
> -      * @link: Links the stream into ``&drm_i915_private->streams``<br>
> +      * @engine: Engine to which this stream corresponds.<br>
>        */<br>
> -     struct list_head link;<br>
> +     struct intel_engine_cs *engine;<br>
<br>
This series only supports cs_mode on the RCS command stream.<br>
Does it really make sense to add an srcu on all the engines rather than<br>
keeping it part of dev_priv->perf ?<br>
<br>
We can always add that later if needed.<br>
<br>
</span><sagar> Yes. Will change this.<br>
<div><div class="gmail-h5">><br>
>       /**<br>
>        * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`<br>
> @@ -2022,17 +2040,41 @@ struct i915_perf_stream {<br>
>       struct i915_gem_context *ctx;<br>
><br>
>       /**<br>
> -      * @enabled: Whether the stream is currently enabled, considering<br>
> -      * whether the stream was opened in a disabled state and based<br>
> -      * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.<br>
> +      * @state: Current stream state, which can be either disabled, enabled,<br>
> +      * or enable_in_progress, while considering whether the stream was<br>
> +      * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and<br>
> +      * `I915_PERF_IOCTL_DISABLE` calls.<br>
>        */<br>
> -     bool enabled;<br>
> +     enum i915_perf_stream_state state;<br>
> +<br>
> +     /**<br>
> +      * @cs_mode: Whether command stream based perf sample collection is<br>
> +      * enabled for this stream<br>
> +      */<br>
> +     bool cs_mode;<br>
> +<br>
> +     /**<br>
> +      * @using_oa: Whether OA unit is in use for this particular stream<br>
> +      */<br>
> +     bool using_oa;<br>
><br>
>       /**<br>
>        * @ops: The callbacks providing the implementation of this specific<br>
>        * type of configured stream.<br>
>        */<br>
>       const struct i915_perf_stream_ops *ops;<br>
> +<br>
> +     /* Command stream based perf data buffer */<br>
> +     struct {<br>
> +             struct i915_vma *vma;<br>
> +             u8 *vaddr;<br>
> +     } cs_buffer;<br>
> +<br>
> +     struct list_head cs_samples;<br>
> +     spinlock_t cs_samples_lock;<br>
> +<br>
> +     wait_queue_head_t poll_wq;<br>
> +     bool pollin;<br>
>   };<br>
><br>
>   /**<br>
> @@ -2095,7 +2137,8 @@ struct i915_oa_ops {<br>
>       int (*read)(struct i915_perf_stream *stream,<br>
>                   char __user *buf,<br>
>                   size_t count,<br>
> -                 size_t *offset);<br>
> +                 size_t *offset,<br>
> +                 u32 ts);<br>
><br>
>       /**<br>
>        * @oa_hw_tail_read: read the OA tail pointer register<br>
> @@ -2107,6 +2150,36 @@ struct i915_oa_ops {<br>
>       u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);<br>
>   };<br>
><br>
> +/*<br>
> + * i915_perf_cs_sample - Sample element to hold info about a single perf<br>
> + * sample data associated with a particular GPU command stream.<br>
> + */<br>
> +struct i915_perf_cs_sample {<br>
> +     /**<br>
> +      * @link: Links the sample into ``&stream->cs_samples``<br>
> +      */<br>
> +     struct list_head link;<br>
> +<br>
> +     /**<br>
> +      * @request: GEM request associated with the sample. The commands to<br>
> +      * capture the perf metrics are inserted into the command streamer in<br>
> +      * context of this request.<br>
> +      */<br>
> +     struct drm_i915_gem_request *request;<br>
> +<br>
> +     /**<br>
> +      * @offset: Offset into ``&stream->cs_buffer``<br>
> +      * where the perf metrics will be collected, when the commands inserted<br>
> +      * into the command stream are executed by GPU.<br>
> +      */<br>
> +     u32 offset;<br>
> +<br>
> +     /**<br>
> +      * @ctx_id: Context ID associated with this perf sample<br>
> +      */<br>
> +     u32 ctx_id;<br>
> +};<br>
> +<br>
>   struct intel_cdclk_state {<br>
>       unsigned int cdclk, vco, ref;<br>
>   };<br>
> @@ -2431,17 +2504,10 @@ struct drm_i915_private {<br>
>               struct ctl_table_header *sysctl_header;<br>
><br>
>               struct mutex lock;<br>
> -             struct list_head streams;<br>
> -<br>
> -             struct {<br>
> -                     struct i915_perf_stream *exclusive_stream;<br>
><br>
> -                     u32 specific_ctx_id;<br>
> -<br>
> -                     struct hrtimer poll_check_timer;<br>
> -                     wait_queue_head_t poll_wq;<br>
> -                     bool pollin;<br>
> +             struct hrtimer poll_check_timer;<br>
><br>
> +             struct {<br>
>                       /**<br>
>                        * For rate limiting any notifications of spurious<br>
>                        * invalid OA reports<br>
> @@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,<br>
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,<br>
>                           struct i915_gem_context *ctx,<br>
>                           uint32_t *reg_state);<br>
> +void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *req,<br>
> +                                bool preallocate);<br>
><br>
>   /* i915_gem_evict.c */<br>
>   int __must_check i915_gem_evict_something(<wbr>struct i915_address_space *vm,<br>
> @@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,<br>
>   /* i915_perf.c */<br>
>   extern void i915_perf_init(struct drm_i915_private *dev_priv);<br>
>   extern void i915_perf_fini(struct drm_i915_private *dev_priv);<br>
> +extern void i915_perf_streams_mark_idle(<wbr>struct drm_i915_private *dev_priv);<br>
>   extern void i915_perf_register(struct drm_i915_private *dev_priv);<br>
>   extern void i915_perf_unregister(struct drm_i915_private *dev_priv);<br>
><br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>gem.c b/drivers/gpu/drm/i915/i915_<wbr>gem.c<br>
> index 000a764..7b01548 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>gem.c<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>gem.c<br>
> @@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)<br>
><br>
>       intel_engines_mark_idle(dev_<wbr>priv);<br>
>       i915_gem_timelines_mark_idle(<wbr>dev_priv);<br>
> +     i915_perf_streams_mark_idle(<wbr>dev_priv);<br>
><br>
>       GEM_BUG_ON(!dev_priv->gt.<wbr>awake);<br>
>       dev_priv->gt.awake = false;<br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c b/drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c<br>
> index 5fa4476..bfe546b 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>gem_execbuffer.c<br>
> @@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,<br>
>       if (err)<br>
>               goto err_request;<br>
><br>
> +     i915_perf_emit_sample_capture(<wbr>rq, true);<br>
> +<br>
>       err = eb->engine->emit_bb_start(rq,<br>
>                                       batch->node.start, PAGE_SIZE,<br>
>                                       cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);<br>
>       if (err)<br>
>               goto err_request;<br>
><br>
> +     i915_perf_emit_sample_capture(<wbr>rq, false);<br>
> +<br>
>       GEM_BUG_ON(!reservation_<wbr>object_test_signaled_rcu(<wbr>batch->resv, true));<br>
>       i915_vma_move_to_active(batch, rq, 0);<br>
>       reservation_object_lock(batch-<wbr>>resv, NULL);<br>
> @@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
>                       return err;<br>
>       }<br>
><br>
> +     i915_perf_emit_sample_capture(<wbr>eb->request, true);<br>
> +<br>
>       err = eb->engine->emit_bb_start(eb-><wbr>request,<br>
>                                       eb->batch->node.start +<br>
>                                       eb->batch_start_offset,<br>
> @@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
>       if (err)<br>
>               return err;<br>
><br>
> +     i915_perf_emit_sample_capture(<wbr>eb->request, false);<br>
> +<br>
>       return 0;<br>
>   }<br>
><br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>perf.c b/drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
> index b272653..57e1936 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
> @@ -193,6 +193,7 @@<br>
><br>
>   #include <linux/anon_inodes.h><br>
>   #include <linux/sizes.h><br>
> +#include <linux/srcu.h><br>
><br>
>   #include "i915_drv.h"<br>
>   #include "i915_oa_hsw.h"<br>
> @@ -288,6 +289,12 @@<br>
>   #define OAREPORT_REASON_CTX_SWITCH     (1<<3)<br>
>   #define OAREPORT_REASON_CLK_RATIO      (1<<5)<br>
><br>
> +/* Data common to periodic and RCS based OA samples */<br>
> +struct i915_perf_sample_data {<br>
> +     u64 source;<br>
> +     u64 ctx_id;<br>
> +     const u8 *report;<br>
> +};<br>
><br>
>   /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate<br>
>    *<br>
> @@ -328,8 +335,19 @@<br>
>       [I915_OA_FORMAT_C4_B8]              = { 7, 64 },<br>
>   };<br>
><br>
> +/* Duplicated from similar static enum in i915_gem_execbuffer.c */<br>
> +#define I915_USER_RINGS (4)<br>
> +static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {<br>
> +     [I915_EXEC_DEFAULT]     = RCS,<br>
> +     [I915_EXEC_RENDER]      = RCS,<br>
> +     [I915_EXEC_BLT]         = BCS,<br>
> +     [I915_EXEC_BSD]         = VCS,<br>
> +     [I915_EXEC_VEBOX]       = VECS<br>
> +};<br>
> +<br>
>   #define SAMPLE_OA_REPORT      (1<<0)<br>
>   #define SAMPLE_OA_SOURCE      (1<<1)<br>
> +#define SAMPLE_CTX_ID              (1<<2)<br>
><br>
>   /**<br>
>    * struct perf_open_properties - for validated properties given to open a stream<br>
> @@ -340,6 +358,9 @@<br>
>    * @oa_format: An OA unit HW report format<br>
>    * @oa_periodic: Whether to enable periodic OA unit sampling<br>
>    * @oa_period_exponent: The OA unit sampling period is derived from this<br>
> + * @cs_mode: Whether the stream is configured to enable collection of metrics<br>
> + * associated with command stream of a particular GPU engine<br>
> + * @engine: The GPU engine associated with the stream in case cs_mode is enabled<br>
>    *<br>
>    * As read_properties_unlocked() enumerates and validates the properties given<br>
>    * to open a stream of metrics the configuration is built up in the structure<br>
> @@ -356,6 +377,10 @@ struct perf_open_properties {<br>
>       int oa_format;<br>
>       bool oa_periodic;<br>
>       int oa_period_exponent;<br>
> +<br>
> +     /* Command stream mode */<br>
> +     bool cs_mode;<br>
> +     enum intel_engine_id engine;<br>
>   };<br>
><br>
>   static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
> @@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
>   }<br>
><br>
>   /**<br>
> + * i915_perf_emit_sample_capture - Insert the commands to capture metrics into<br>
> + * the command stream of a GPU engine.<br>
> + * @request: request in whose context the metrics are being collected.<br>
> + * @preallocate: allocate space in ring for related sample.<br>
> + *<br>
> + * The function provides a hook through which the commands to capture perf<br>
> + * metrics, are inserted into the command stream of a GPU engine.<br>
> + */<br>
> +void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *request,<br>
> +                                bool preallocate)<br>
> +{<br>
> +     struct intel_engine_cs *engine = request->engine;<br>
> +     struct drm_i915_private *dev_priv = engine->i915;<br>
> +     struct i915_perf_stream *stream;<br>
> +     int idx;<br>
> +<br>
> +     if (!dev_priv->perf.initialized)<br>
> +             return;<br>
> +<br>
> +     idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +     stream = srcu_dereference(engine-><wbr>exclusive_stream, &engine->perf_srcu);<br>
> +     if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&<br>
> +                             stream->cs_mode)<br>
> +             stream->ops->emit_sample_<wbr>capture(stream, request,<br>
> +                                              preallocate);<br>
> +     srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
> +}<br>
> +<br>
> +/**<br>
> + * release_perf_samples - Release old perf samples to make space for new<br>
> + * sample data.<br>
> + * @stream: Stream from which space is to be freed up.<br>
> + * @target_size: Space required to be freed up.<br>
> + *<br>
> + * We also dereference the associated request before deleting the sample.<br>
> + * Also, no need to check whether the commands associated with old samples<br>
> + * have been completed. This is because these sample entries are anyways going<br>
> + * to be replaced by a new sample, and gpu will eventually overwrite the buffer<br>
> + * contents, when the request associated with new sample completes.<br>
> + */<br>
> +static void release_perf_samples(struct i915_perf_stream *stream,<br>
> +                              u32 target_size)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +     struct i915_perf_cs_sample *sample, *next;<br>
> +     u32 sample_size = dev_priv->perf.oa.oa_buffer.<wbr>format_size;<br>
> +     u32 size = 0;<br>
> +<br>
> +     list_for_each_entry_safe<br>
> +             (sample, next, &stream->cs_samples, link) {<br>
> +             size += sample_size;<br>
> +             i915_gem_request_put(sample-><wbr>request);<br>
> +             list_del(&sample->link);<br>
> +             kfree(sample);<br>
> +<br>
> +             if (size >= target_size)<br>
> +                     break;<br>
> +     }<br>
> +}<br>
> +<br>
> +/**<br>
> + * insert_perf_sample - Insert a perf sample entry to the sample list.<br>
> + * @stream: Stream into which sample is to be inserted.<br>
> + * @sample: perf CS sample to be inserted into the list<br>
> + *<br>
> + * This function never fails, since it always manages to insert the sample.<br>
> + * If the space is exhausted in the buffer, it will remove the older<br>
> + * entries in order to make space.<br>
> + */<br>
> +static void insert_perf_sample(struct i915_perf_stream *stream,<br>
> +                             struct i915_perf_cs_sample *sample)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +     struct i915_perf_cs_sample *first, *last;<br>
> +     int max_offset = stream->cs_buffer.vma->obj-><wbr>base.size;<br>
> +     u32 sample_size = dev_priv->perf.oa.oa_buffer.<wbr>format_size;<br>
> +     unsigned long flags;<br>
> +<br>
> +     spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +     if (list_empty(&stream->cs_<wbr>samples)) {<br>
> +             sample->offset = 0;<br>
> +             list_add_tail(&sample->link, &stream->cs_samples);<br>
> +             spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +             return;<br>
> +     }<br>
> +<br>
> +     first = list_first_entry(&stream->cs_<wbr>samples, typeof(*first),<br>
> +                             link);<br>
> +     last = list_last_entry(&stream->cs_<wbr>samples, typeof(*last),<br>
> +                             link);<br>
> +<br>
> +     if (last->offset >= first->offset) {<br>
> +             /* Sufficient space available at the end of buffer? */<br>
> +             if (last->offset + 2*sample_size < max_offset)<br>
> +                     sample->offset = last->offset + sample_size;<br>
> +             /*<br>
> +              * Wraparound condition. Is sufficient space available at<br>
> +              * beginning of buffer?<br>
> +              */<br>
> +             else if (sample_size < first->offset)<br>
> +                     sample->offset = 0;<br>
> +             /* Insufficient space. Overwrite existing old entries */<br>
> +             else {<br>
> +                     u32 target_size = sample_size - first->offset;<br>
> +<br>
> +                     release_perf_samples(stream, target_size);<br>
> +                     sample->offset = 0;<br>
> +             }<br>
> +     } else {<br>
> +             /* Sufficient space available? */<br>
> +             if (last->offset + 2*sample_size < first->offset)<br>
> +                     sample->offset = last->offset + sample_size;<br>
> +             /* Insufficient space. Overwrite existing old entries */<br>
> +             else {<br>
> +                     u32 target_size = sample_size -<br>
> +                             (first->offset - last->offset -<br>
> +                             sample_size);<br>
> +<br>
> +                     release_perf_samples(stream, target_size);<br>
> +                     sample->offset = last->offset + sample_size;<br>
> +             }<br>
> +     }<br>
> +     list_add_tail(&sample->link, &stream->cs_samples);<br>
> +     spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +}<br>
> +<br>
> +/**<br>
> + * i915_emit_oa_report_capture - Insert the commands to capture OA<br>
> + * reports metrics into the render command stream<br>
> + * @request: request in whose context the metrics are being collected.<br>
> + * @preallocate: allocate space in ring for related sample.<br>
> + * @offset: command stream buffer offset where the OA metrics need to be<br>
> + * collected<br>
> + */<br>
> +static int i915_emit_oa_report_capture(<br>
> +                             struct drm_i915_gem_request *request,<br>
> +                             bool preallocate,<br>
> +                             u32 offset)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = request->i915;<br>
> +     struct intel_engine_cs *engine = request->engine;<br>
> +     struct i915_perf_stream *stream;<br>
> +     u32 addr = 0;<br>
> +     u32 cmd, len = 4, *cs;<br>
> +     int idx;<br>
> +<br>
> +     idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +     stream = srcu_dereference(engine-><wbr>exclusive_stream, &engine->perf_srcu);<br>
> +     addr = stream->cs_buffer.vma->node.<wbr>start + offset;<br>
> +     srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
> +<br>
> +     if (WARN_ON(addr & 0x3f)) {<br>
> +             DRM_ERROR("OA buffer address not aligned to 64 byte\n");<br>
> +             return -EINVAL;<br>
> +     }<br>
> +<br>
> +     if (preallocate)<br>
> +             request->reserved_space += len;<br>
> +     else<br>
> +             request->reserved_space -= len;<br>
> +<br>
> +     cs = intel_ring_begin(request, 4);<br>
> +     if (IS_ERR(cs))<br>
> +             return PTR_ERR(cs);<br>
> +<br>
> +     cmd = MI_REPORT_PERF_COUNT | (1<<0);<br>
> +     if (INTEL_GEN(dev_priv) >= 8)<br>
> +             cmd |= (2<<0);<br>
> +<br>
> +     *cs++ = cmd;<br>
> +     *cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;<br>
> +     *cs++ = request->fence.seqno;<br>
> +<br>
> +     if (INTEL_GEN(dev_priv) >= 8)<br>
> +             *cs++ = 0;<br>
> +     else<br>
> +             *cs++ = MI_NOOP;<br>
> +<br>
> +     intel_ring_advance(request, cs);<br>
> +<br>
> +     return 0;<br>
> +}<br>
> +<br>
> +/**<br>
> + * i915_perf_stream_emit_sample_<wbr>capture - Insert the commands to capture perf<br>
> + * metrics into the GPU command stream<br>
> + * @stream: An i915-perf stream opened for GPU metrics<br>
> + * @request: request in whose context the metrics are being collected.<br>
> + * @preallocate: allocate space in ring for related sample.<br>
> + */<br>
> +static void i915_perf_stream_emit_sample_<wbr>capture(<br>
> +                                     struct i915_perf_stream *stream,<br>
> +                                     struct drm_i915_gem_request *request,<br>
> +                                     bool preallocate)<br>
> +{<br>
> +     struct reservation_object *resv = stream->cs_buffer.vma->resv;<br>
> +     struct i915_perf_cs_sample *sample;<br>
> +     unsigned long flags;<br>
> +     int ret;<br>
> +<br>
> +     sample = kzalloc(sizeof(*sample), GFP_KERNEL);<br>
> +     if (sample == NULL) {<br>
> +             DRM_ERROR("Perf sample alloc failed\n");<br>
> +             return;<br>
> +     }<br>
> +<br>
> +     sample->request = i915_gem_request_get(request);<br>
> +     sample->ctx_id = request->ctx->hw_id;<br>
> +<br>
> +     insert_perf_sample(stream, sample);<br>
> +<br>
> +     if (stream->sample_flags & SAMPLE_OA_REPORT) {<br>
> +             ret = i915_emit_oa_report_capture(<wbr>request,<br>
> +                                               preallocate,<br>
> +                                               sample->offset);<br>
> +             if (ret)<br>
> +                     goto err_unref;<br>
> +     }<br>
> +<br>
> +     reservation_object_lock(resv, NULL);<br>
> +     if (reservation_object_reserve_<wbr>shared(resv) == 0)<br>
> +             reservation_object_add_shared_<wbr>fence(resv, &request->fence);<br>
> +     reservation_object_unlock(<wbr>resv);<br>
> +<br>
> +     i915_vma_move_to_active(<wbr>stream->cs_buffer.vma, request,<br>
> +                                     EXEC_OBJECT_WRITE);<br>
> +     return;<br>
> +<br>
> +err_unref:<br>
> +     i915_gem_request_put(sample-><wbr>request);<br>
> +     spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +     list_del(&sample->link);<br>
> +     spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +     kfree(sample);<br>
> +}<br>
> +<br>
> +/**<br>
> + * i915_perf_stream_release_<wbr>samples - Release the perf command stream samples<br>
> + * @stream: Stream from which sample are to be released.<br>
> + *<br>
> + * Note: The associated requests should be completed before releasing the<br>
> + * references here.<br>
> + */<br>
> +static void i915_perf_stream_release_<wbr>samples(struct i915_perf_stream *stream)<br>
> +{<br>
> +     struct i915_perf_cs_sample *entry, *next;<br>
> +     unsigned long flags;<br>
> +<br>
> +     list_for_each_entry_safe<br>
> +             (entry, next, &stream->cs_samples, link) {<br>
> +             i915_gem_request_put(entry-><wbr>request);<br>
> +<br>
> +             spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +             list_del(&entry->link);<br>
> +             spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +             kfree(entry);<br>
> +     }<br>
> +}<br>
> +<br>
> +/**<br>
>    * oa_buffer_check_unlocked - check for data and update tail ptr state<br>
>    * @dev_priv: i915 device instance<br>
>    *<br>
> @@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
>   }<br>
><br>
>   /**<br>
> - * append_oa_sample - Copies single OA report into userspace read() buffer.<br>
> - * @stream: An i915-perf stream opened for OA metrics<br>
> + * append_perf_sample - Copies single perf sample into userspace read() buffer.<br>
> + * @stream: An i915-perf stream opened for perf samples<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> - * @report: A single OA report to (optionally) include as part of the sample<br>
> + * @data: perf sample data which contains (optionally) metrics configured<br>
> + * earlier when opening a stream<br>
>    *<br>
>    * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`<br>
>    * properties when opening a stream, tracked as `stream->sample_flags`. This<br>
> @@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
>    *<br>
>    * Returns: 0 on success, negative error code on failure.<br>
>    */<br>
> -static int append_oa_sample(struct i915_perf_stream *stream,<br>
> +static int append_perf_sample(struct i915_perf_stream *stream,<br>
>                           char __user *buf,<br>
>                           size_t count,<br>
>                           size_t *offset,<br>
> -                         const u8 *report)<br>
> +                         const struct i915_perf_sample_data *data)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
>       int report_size = dev_priv->perf.oa.oa_buffer.<wbr>format_size;<br>
> @@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
>        * transition. These are considered as source 'OABUFFER'.<br>
>        */<br>
>       if (sample_flags & SAMPLE_OA_SOURCE) {<br>
> -             u64 source = I915_PERF_SAMPLE_OA_SOURCE_<wbr>OABUFFER;<br>
> +             if (copy_to_user(buf, &data->source, 8))<br>
> +                     return -EFAULT;<br>
> +             buf += 8;<br>
> +     }<br>
><br>
> -             if (copy_to_user(buf, &source, 8))<br>
> +     if (sample_flags & SAMPLE_CTX_ID) {<br>
> +             if (copy_to_user(buf, &data->ctx_id, 8))<br>
>                       return -EFAULT;<br>
>               buf += 8;<br>
>       }<br>
><br>
>       if (sample_flags & SAMPLE_OA_REPORT) {<br>
> -             if (copy_to_user(buf, report, report_size))<br>
> +             if (copy_to_user(buf, data->report, report_size))<br>
>                       return -EFAULT;<br>
> +             buf += report_size;<br>
>       }<br>
><br>
>       (*offset) += header.size;<br>
> @@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
>   }<br>
><br>
>   /**<br>
> + * append_oa_buffer_sample - Copies single periodic OA report into userspace<br>
> + * read() buffer.<br>
> + * @stream: An i915-perf stream opened for OA metrics<br>
> + * @buf: destination buffer given by userspace<br>
> + * @count: the number of bytes userspace wants to read<br>
> + * @offset: (inout): the current position for writing into @buf<br>
> + * @report: A single OA report to (optionally) include as part of the sample<br>
> + *<br>
> + * Returns: 0 on success, negative error code on failure.<br>
> + */<br>
> +static int append_oa_buffer_sample(struct i915_perf_stream *stream,<br>
> +                             char __user *buf, size_t count,<br>
> +                             size_t *offset, const u8 *report)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +     u32 sample_flags = stream->sample_flags;<br>
> +     struct i915_perf_sample_data data = { 0 };<br>
> +     u32 *report32 = (u32 *)report;<br>
> +<br>
> +     if (sample_flags & SAMPLE_OA_SOURCE)<br>
> +             data.source = I915_PERF_SAMPLE_OA_SOURCE_<wbr>OABUFFER;<br>
> +<br>
> +     if (sample_flags & SAMPLE_CTX_ID) {<br>
> +             if (INTEL_INFO(dev_priv)->gen < 8)<br>
> +                     data.ctx_id = 0;<br>
> +             else {<br>
> +                     /*<br>
> +                      * XXX: Just keep the lower 21 bits for now since I'm<br>
> +                      * not entirely sure if the HW touches any of the higher<br>
> +                      * bits in this field<br>
> +                      */<br>
> +                     data.ctx_id = report32[2] & 0x1fffff;<br>
> +             }<br>
> +     }<br>
> +<br>
> +     if (sample_flags & SAMPLE_OA_REPORT)<br>
> +             data.report = report;<br>
> +<br>
> +     return append_perf_sample(stream, buf, count, offset, &data);<br>
> +}<br>
> +<br>
> +/**<br>
>    * Copies all buffered OA reports into userspace read() buffer.<br>
>    * @stream: An i915-perf stream opened for OA metrics<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> + * @ts: copy OA reports till this timestamp<br>
>    *<br>
>    * Notably any error condition resulting in a short read (-%ENOSPC or<br>
>    * -%EFAULT) will be returned even though one or more records may<br>
> @@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
>   static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>                                 char __user *buf,<br>
>                                 size_t count,<br>
> -                               size_t *offset)<br>
> +                               size_t *offset,<br>
> +                               u32 ts)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
>       int report_size = dev_priv->perf.oa.oa_buffer.<wbr>format_size;<br>
> @@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>       u32 taken;<br>
>       int ret = 0;<br>
><br>
> -     if (WARN_ON(!stream->enabled))<br>
> +     if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
>               return -EIO;<br>
><br>
>       spin_lock_irqsave(&dev_priv-><wbr>perf.oa.oa_buffer.ptr_lock, flags);<br>
> @@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>               u32 *report32 = (void *)report;<br>
>               u32 ctx_id;<br>
>               u32 reason;<br>
> +             u32 report_ts = report32[1];<br>
> +<br>
> +             /* Report timestamp should not exceed the given ts */<br>
> +             if (report_ts > ts)<br>
> +                     break;<br>
><br>
>               /*<br>
>                * All the report sizes factor neatly into the buffer<br>
> @@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>                * switches since it's not-uncommon for periodic samples to<br>
>                * identify a switch before any 'context switch' report.<br>
>                */<br>
> -             if (!dev_priv->perf.oa.exclusive_<wbr>stream->ctx ||<br>
> -                 dev_priv->perf.oa.specific_<wbr>ctx_id == ctx_id ||<br>
> +             if (!stream->ctx ||<br>
> +                 stream->engine->specific_ctx_<wbr>id == ctx_id ||<br>
>                   (dev_priv->perf.oa.oa_buffer.<wbr>last_ctx_id ==<br>
> -                  dev_priv->perf.oa.specific_<wbr>ctx_id) ||<br>
> +                  stream->engine->specific_ctx_<wbr>id) ||<br>
>                   reason & OAREPORT_REASON_CTX_SWITCH) {<br>
><br>
>                       /*<br>
>                        * While filtering for a single context we avoid<br>
>                        * leaking the IDs of other contexts.<br>
>                        */<br>
> -                     if (dev_priv->perf.oa.exclusive_<wbr>stream->ctx &&<br>
> -                         dev_priv->perf.oa.specific_<wbr>ctx_id != ctx_id) {<br>
> +                     if (stream->ctx &&<br>
> +                         stream->engine->specific_ctx_<wbr>id != ctx_id) {<br>
>                               report32[2] = INVALID_CTX_ID;<br>
>                       }<br>
><br>
> -                     ret = append_oa_sample(stream, buf, count, offset,<br>
> -                                            report);<br>
> +                     ret = append_oa_buffer_sample(<wbr>stream, buf, count,<br>
> +                                                   offset, report);<br>
>                       if (ret)<br>
>                               break;<br>
><br>
> @@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> + * @ts: copy OA reports till this timestamp<br>
>    *<br>
>    * Checks OA unit status registers and if necessary appends corresponding<br>
>    * status records for userspace (such as for a buffer full condition) and then<br>
> @@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
>   static int gen8_oa_read(struct i915_perf_stream *stream,<br>
>                       char __user *buf,<br>
>                       size_t count,<br>
> -                     size_t *offset)<br>
> +                     size_t *offset,<br>
> +                     u32 ts)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
>       u32 oastatus;<br>
> @@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
>                          oastatus & ~GEN8_OASTATUS_REPORT_LOST);<br>
>       }<br>
><br>
> -     return gen8_append_oa_reports(stream, buf, count, offset);<br>
> +     return gen8_append_oa_reports(stream, buf, count, offset, ts);<br>
>   }<br>
><br>
>   /**<br>
> @@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> + * @ts: copy OA reports till this timestamp<br>
>    *<br>
>    * Notably any error condition resulting in a short read (-%ENOSPC or<br>
>    * -%EFAULT) will be returned even though one or more records may<br>
> @@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
>   static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
>                                 char __user *buf,<br>
>                                 size_t count,<br>
> -                               size_t *offset)<br>
> +                               size_t *offset,<br>
> +                               u32 ts)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
>       int report_size = dev_priv->perf.oa.oa_buffer.<wbr>format_size;<br>
> @@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
>       u32 taken;<br>
>       int ret = 0;<br>
><br>
> -     if (WARN_ON(!stream->enabled))<br>
> +     if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
>               return -EIO;<br>
><br>
>       spin_lock_irqsave(&dev_priv-><wbr>perf.oa.oa_buffer.ptr_lock, flags);<br>
> @@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
>                       continue;<br>
>               }<br>
><br>
> -             ret = append_oa_sample(stream, buf, count, offset, report);<br>
> +             /* Report timestamp should not exceed the given ts */<br>
> +             if (report32[1] > ts)<br>
> +                     break;<br>
> +<br>
> +             ret = append_oa_buffer_sample(<wbr>stream, buf, count, offset,<br>
> +                                           report);<br>
>               if (ret)<br>
>                       break;<br>
><br>
> @@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> + * @ts: copy OA reports till this timestamp<br>
>    *<br>
>    * Checks Gen 7 specific OA unit status registers and if necessary appends<br>
>    * corresponding status records for userspace (such as for a buffer full<br>
> @@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
>   static int gen7_oa_read(struct i915_perf_stream *stream,<br>
>                       char __user *buf,<br>
>                       size_t count,<br>
> -                     size_t *offset)<br>
> +                     size_t *offset,<br>
> +                     u32 ts)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
>       u32 oastatus1;<br>
> @@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
>                       GEN7_OASTATUS1_REPORT_LOST;<br>
>       }<br>
><br>
> -     return gen7_append_oa_reports(stream, buf, count, offset);<br>
> +     return gen7_append_oa_reports(stream, buf, count, offset, ts);<br>
> +}<br>
> +<br>
> +/**<br>
> + * append_cs_buffer_sample - Copies single perf sample data associated with<br>
> + * GPU command stream, into userspace read() buffer.<br>
> + * @stream: An i915-perf stream opened for perf CS metrics<br>
> + * @buf: destination buffer given by userspace<br>
> + * @count: the number of bytes userspace wants to read<br>
> + * @offset: (inout): the current position for writing into @buf<br>
> + * @node: Sample data associated with perf metrics<br>
> + *<br>
> + * Returns: 0 on success, negative error code on failure.<br>
> + */<br>
> +static int append_cs_buffer_sample(struct i915_perf_stream *stream,<br>
> +                             char __user *buf,<br>
> +                             size_t count,<br>
> +                             size_t *offset,<br>
> +                             struct i915_perf_cs_sample *node)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +     struct i915_perf_sample_data data = { 0 };<br>
> +     u32 sample_flags = stream->sample_flags;<br>
> +     int ret = 0;<br>
> +<br>
> +     if (sample_flags & SAMPLE_OA_REPORT) {<br>
> +             const u8 *report = stream->cs_buffer.vaddr + node->offset;<br>
> +             u32 sample_ts = *(u32 *)(report + 4);<br>
> +<br>
> +             data.report = report;<br>
> +<br>
> +             /* First, append the periodic OA samples having lower<br>
> +              * timestamp values<br>
> +              */<br>
> +             ret = dev_priv->perf.oa.ops.read(<wbr>stream, buf, count, offset,<br>
> +                                              sample_ts);<br>
> +             if (ret)<br>
> +                     return ret;<br>
> +     }<br>
> +<br>
> +     if (sample_flags & SAMPLE_OA_SOURCE)<br>
> +             data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;<br>
> +<br>
> +     if (sample_flags & SAMPLE_CTX_ID)<br>
> +             data.ctx_id = node->ctx_id;<br>
> +<br>
> +     return append_perf_sample(stream, buf, count, offset, &data);<br>
>   }<br>
><br>
>   /**<br>
> - * i915_oa_wait_unlocked - handles blocking IO until OA data available<br>
> + * append_cs_buffer_samples: Copies all command stream based perf samples<br>
> + * into userspace read() buffer.<br>
> + * @stream: An i915-perf stream opened for perf CS metrics<br>
> + * @buf: destination buffer given by userspace<br>
> + * @count: the number of bytes userspace wants to read<br>
> + * @offset: (inout): the current position for writing into @buf<br>
> + *<br>
> + * Notably any error condition resulting in a short read (-%ENOSPC or<br>
> + * -%EFAULT) will be returned even though one or more records may<br>
> + * have been successfully copied. In this case it's up to the caller<br>
> + * to decide if the error should be squashed before returning to<br>
> + * userspace.<br>
> + *<br>
> + * Returns: 0 on success, negative error code on failure.<br>
> + */<br>
> +static int append_cs_buffer_samples(<wbr>struct i915_perf_stream *stream,<br>
> +                             char __user *buf,<br>
> +                             size_t count,<br>
> +                             size_t *offset)<br>
> +{<br>
> +     struct i915_perf_cs_sample *entry, *next;<br>
> +     LIST_HEAD(free_list);<br>
> +     int ret = 0;<br>
> +     unsigned long flags;<br>
> +<br>
> +     spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +     if (list_empty(&stream->cs_<wbr>samples)) {<br>
> +             spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +             return 0;<br>
> +     }<br>
> +     list_for_each_entry_safe(<wbr>entry, next,<br>
> +                              &stream->cs_samples, link) {<br>
> +             if (!i915_gem_request_completed(<wbr>entry->request))<br>
> +                     break;<br>
> +             list_move_tail(&entry->link, &free_list);<br>
> +     }<br>
> +     spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +<br>
> +     if (list_empty(&free_list))<br>
> +             return 0;<br>
> +<br>
> +     list_for_each_entry_safe(<wbr>entry, next, &free_list, link) {<br>
> +             ret = append_cs_buffer_sample(<wbr>stream, buf, count, offset,<br>
> +                                           entry);<br>
> +             if (ret)<br>
> +                     break;<br>
> +<br>
> +             list_del(&entry->link);<br>
> +             i915_gem_request_put(entry-><wbr>request);<br>
> +             kfree(entry);<br>
> +     }<br>
> +<br>
> +     /* Don't discard remaining entries, keep them for next read */<br>
> +     spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +     list_splice(&free_list, &stream->cs_samples);<br>
> +     spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +<br>
> +     return ret;<br>
> +}<br>
> +<br>
> +/*<br>
> + * cs_buffer_is_empty - Checks whether the command stream buffer<br>
> + * associated with the stream has data available.<br>
>    * @stream: An i915-perf stream opened for OA metrics<br>
>    *<br>
> + * Returns: true if atleast one request associated with command stream is<br>
> + * completed, else returns false.<br>
> + */<br>
> +static bool cs_buffer_is_empty(struct i915_perf_stream *stream)<br>
> +<br>
> +{<br>
> +     struct i915_perf_cs_sample *entry = NULL;<br>
> +     struct drm_i915_gem_request *request = NULL;<br>
> +     unsigned long flags;<br>
> +<br>
> +     spin_lock_irqsave(&stream->cs_<wbr>samples_lock, flags);<br>
> +     entry = list_first_entry_or_null(&<wbr>stream->cs_samples,<br>
> +                     struct i915_perf_cs_sample, link);<br>
> +     if (entry)<br>
> +             request = entry->request;<br>
> +     spin_unlock_irqrestore(&<wbr>stream->cs_samples_lock, flags);<br>
> +<br>
> +     if (!entry)<br>
> +             return true;<br>
> +     else if (!i915_gem_request_completed(<wbr>request))<br>
> +             return true;<br>
> +     else<br>
> +             return false;<br>
> +}<br>
> +<br>
> +/**<br>
> + * stream_have_data_unlocked - Checks whether the stream has data available<br>
> + * @stream: An i915-perf stream opened for OA metrics<br>
> + *<br>
> + * For command stream based streams, check if the command stream buffer has<br>
> + * atleast one sample available, if not return false, irrespective of periodic<br>
> + * oa buffer having the data or not.<br>
> + */<br>
> +<br>
> +static bool stream_have_data_unlocked(<wbr>struct i915_perf_stream *stream)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +<br>
> +     if (stream->cs_mode)<br>
> +             return !cs_buffer_is_empty(stream);<br>
> +     else<br>
> +             return oa_buffer_check_unlocked(dev_<wbr>priv);<br>
> +}<br>
> +<br>
> +/**<br>
> + * i915_perf_stream_wait_unlocked - handles blocking IO until data available<br>
> + * @stream: An i915-perf stream opened for GPU metrics<br>
> + *<br>
>    * Called when userspace tries to read() from a blocking stream FD opened<br>
> - * for OA metrics. It waits until the hrtimer callback finds a non-empty<br>
> - * OA buffer and wakes us.<br>
> + * for perf metrics. It waits until the hrtimer callback finds a non-empty<br>
> + * command stream buffer / OA buffer and wakes us.<br>
>    *<br>
>    * Note: it's acceptable to have this return with some false positives<br>
>    * since any subsequent read handling will return -EAGAIN if there isn't<br>
> @@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
>    *<br>
>    * Returns: zero on success or a negative error code<br>
>    */<br>
> -static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
> +static int i915_perf_stream_wait_<wbr>unlocked(struct i915_perf_stream *stream)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
> @@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
>       if (!dev_priv->perf.oa.periodic)<br>
>               return -EIO;<br>
><br>
> -     return wait_event_interruptible(dev_<wbr>priv->perf.oa.poll_wq,<br>
> -                                     oa_buffer_check_unlocked(dev_<wbr>priv));<br>
> +     if (stream->cs_mode) {<br>
> +             long int ret;<br>
> +<br>
> +             /* Wait for the all sampled requests. */<br>
> +             ret = reservation_object_wait_<wbr>timeout_rcu(<br>
> +                                                 stream->cs_buffer.vma->resv,<br>
> +                                                 true,<br>
> +                                                 true,<br>
> +                                                 MAX_SCHEDULE_TIMEOUT);<br>
> +             if (unlikely(ret < 0)) {<br>
> +                     DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);<br>
> +                     return ret;<br>
> +             }<br>
> +     }<br>
> +<br>
> +     return wait_event_interruptible(<wbr>stream->poll_wq,<br>
> +                                     stream_have_data_unlocked(<wbr>stream));<br>
>   }<br>
><br>
>   /**<br>
> - * i915_oa_poll_wait - call poll_wait() for an OA stream poll()<br>
> - * @stream: An i915-perf stream opened for OA metrics<br>
> + * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()<br>
> + * @stream: An i915-perf stream opened for GPU metrics<br>
>    * @file: An i915 perf stream file<br>
>    * @wait: poll() state table<br>
>    *<br>
> - * For handling userspace polling on an i915 perf stream opened for OA metrics,<br>
> + * For handling userspace polling on an i915 perf stream opened for metrics,<br>
>    * this starts a poll_wait with the wait queue that our hrtimer callback wakes<br>
> - * when it sees data ready to read in the circular OA buffer.<br>
> + * when it sees data ready to read either in command stream buffer or in the<br>
> + * circular OA buffer.<br>
>    */<br>
> -static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
> +static void i915_perf_stream_poll_wait(<wbr>struct i915_perf_stream *stream,<br>
>                             struct file *file,<br>
>                             poll_table *wait)<br>
>   {<br>
> -     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> -<br>
> -     poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);<br>
> +     poll_wait(file, &stream->poll_wq, wait);<br>
>   }<br>
><br>
>   /**<br>
> - * i915_oa_read - just calls through to &i915_oa_ops->read<br>
> - * @stream: An i915-perf stream opened for OA metrics<br>
> + * i915_perf_stream_read - Reads perf metrics available into userspace read<br>
> + * buffer<br>
> + * @stream: An i915-perf stream opened for GPU metrics<br>
>    * @buf: destination buffer given by userspace<br>
>    * @count: the number of bytes userspace wants to read<br>
>    * @offset: (inout): the current position for writing into @buf<br>
> @@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
>    *<br>
>    * Returns: zero on success or a negative error code<br>
>    */<br>
> -static int i915_oa_read(struct i915_perf_stream *stream,<br>
> +static int i915_perf_stream_read(struct i915_perf_stream *stream,<br>
>                       char __user *buf,<br>
>                       size_t count,<br>
>                       size_t *offset)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
> -     return dev_priv->perf.oa.ops.read(<wbr>stream, buf, count, offset);<br>
> +<br>
> +     if (stream->cs_mode)<br>
> +             return append_cs_buffer_samples(<wbr>stream, buf, count, offset);<br>
> +     else if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
> +             return dev_priv->perf.oa.ops.read(<wbr>stream, buf, count, offset,<br>
> +                                             U32_MAX);<br>
> +     else<br>
> +             return -EINVAL;<br>
>   }<br>
><br>
>   /**<br>
> @@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
>       if (i915.enable_execlists)<br>
> -             dev_priv->perf.oa.specific_<wbr>ctx_id = stream->ctx->hw_id;<br>
> +             stream->engine->specific_ctx_<wbr>id = stream->ctx->hw_id;<br>
>       else {<br>
>               struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
>               struct intel_ring *ring;<br>
> @@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
>                * i915_ggtt_offset() on the fly) considering the difference<br>
>                * with gen8+ and execlists<br>
>                */<br>
> -             dev_priv->perf.oa.specific_<wbr>ctx_id =<br>
> +             stream->engine->specific_ctx_<wbr>id =<br>
>                       i915_ggtt_offset(stream->ctx-><wbr>engine[engine->id].state);<br>
>       }<br>
><br>
> @@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
>       if (i915.enable_execlists) {<br>
> -             dev_priv->perf.oa.specific_<wbr>ctx_id = INVALID_CTX_ID;<br>
> +             stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
>       } else {<br>
>               struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
><br>
>               mutex_lock(&dev_priv->drm.<wbr>struct_mutex);<br>
><br>
> -             dev_priv->perf.oa.specific_<wbr>ctx_id = INVALID_CTX_ID;<br>
> +             stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
>               engine->context_unpin(engine, stream->ctx);<br>
><br>
>               mutex_unlock(&dev_priv->drm.<wbr>struct_mutex);<br>
> @@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
>   }<br>
><br>
>   static void<br>
> +free_cs_buffer(struct i915_perf_stream *stream)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +<br>
> +     mutex_lock(&dev_priv->drm.<wbr>struct_mutex);<br>
> +<br>
> +     i915_gem_object_unpin_map(<wbr>stream->cs_buffer.vma->obj);<br>
> +     i915_vma_unpin_and_release(&<wbr>stream->cs_buffer.vma);<br>
> +<br>
> +     stream->cs_buffer.vma = NULL;<br>
> +     stream->cs_buffer.vaddr = NULL;<br>
> +<br>
> +     mutex_unlock(&dev_priv->drm.<wbr>struct_mutex);<br>
> +}<br>
> +<br>
> +static void<br>
>   free_oa_buffer(struct drm_i915_private *i915)<br>
>   {<br>
>       mutex_lock(&i915->drm.struct_<wbr>mutex);<br>
><br>
>       i915_gem_object_unpin_map(<wbr>i915->perf.oa.oa_buffer.vma-><wbr>obj);<br>
> -     i915_vma_unpin(i915->perf.oa.<wbr>oa_buffer.vma);<br>
> -     i915_gem_object_put(i915-><wbr>perf.oa.oa_buffer.vma->obj);<br>
> +     i915_vma_unpin_and_release(&<wbr>i915->perf.oa.oa_buffer.vma);<br>
><br>
>       i915->perf.oa.oa_buffer.vma = NULL;<br>
>       i915->perf.oa.oa_buffer.vaddr = NULL;<br>
> @@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
>       mutex_unlock(&i915->drm.<wbr>struct_mutex);<br>
>   }<br>
><br>
> -static void i915_oa_stream_destroy(struct i915_perf_stream *stream)<br>
> +static void i915_perf_stream_destroy(<wbr>struct i915_perf_stream *stream)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> -<br>
> -     BUG_ON(stream != dev_priv->perf.oa.exclusive_<wbr>stream);<br>
> +     struct intel_engine_cs *engine = stream->engine;<br>
> +     struct i915_perf_stream *engine_stream;<br>
> +     int idx;<br>
> +<br>
> +     idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +     engine_stream = srcu_dereference(engine-><wbr>exclusive_stream,<br>
> +                                      &engine->perf_srcu);<br>
> +     if (WARN_ON(stream != engine_stream))<br>
> +             return;<br>
> +     srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
><br>
>       /*<br>
>        * Unset exclusive_stream first, it might be checked while<br>
>        * disabling the metric set on gen8+.<br>
>        */<br>
> -     dev_priv->perf.oa.exclusive_<wbr>stream = NULL;<br>
> +     rcu_assign_pointer(stream-><wbr>engine->exclusive_stream, NULL);<br>
> +     synchronize_srcu(&stream-><wbr>engine->perf_srcu);<br>
><br>
> -     dev_priv->perf.oa.ops.disable_<wbr>metric_set(dev_priv);<br>
> +     if (stream->using_oa) {<br>
> +             dev_priv->perf.oa.ops.disable_<wbr>metric_set(dev_priv);<br>
><br>
> -     free_oa_buffer(dev_priv);<br>
> +             free_oa_buffer(dev_priv);<br>
><br>
> -     intel_uncore_forcewake_put(<wbr>dev_priv, FORCEWAKE_ALL);<br>
> -     intel_runtime_pm_put(dev_priv)<wbr>;<br>
> +             intel_uncore_forcewake_put(<wbr>dev_priv, FORCEWAKE_ALL);<br>
> +             intel_runtime_pm_put(dev_priv)<wbr>;<br>
><br>
> -     if (stream->ctx)<br>
> -             oa_put_render_ctx_id(stream);<br>
> +             if (stream->ctx)<br>
> +                     oa_put_render_ctx_id(stream);<br>
> +     }<br>
> +<br>
> +     if (stream->cs_mode)<br>
> +             free_cs_buffer(stream);<br>
><br>
>       if (dev_priv->perf.oa.spurious_<wbr>report_rs.missed) {<br>
>               DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",<br>
> @@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
>        * memory...<br>
>        */<br>
>       memset(dev_priv->perf.oa.oa_<wbr>buffer.vaddr, 0, OA_BUFFER_SIZE);<br>
> -<br>
> -     /* Maybe make ->pollin per-stream state if we support multiple<br>
> -      * concurrent streams in the future.<br>
> -      */<br>
> -     dev_priv->perf.oa.pollin = false;<br>
>   }<br>
><br>
>   static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
> @@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
>        * memory...<br>
>        */<br>
>       memset(dev_priv->perf.oa.oa_<wbr>buffer.vaddr, 0, OA_BUFFER_SIZE);<br>
> -<br>
> -     /*<br>
> -      * Maybe make ->pollin per-stream state if we support multiple<br>
> -      * concurrent streams in the future.<br>
> -      */<br>
> -     dev_priv->perf.oa.pollin = false;<br>
>   }<br>
><br>
> -static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
> +static int alloc_obj(struct drm_i915_private *dev_priv,<br>
> +                  struct i915_vma **vma, u8 **vaddr)<br>
>   {<br>
>       struct drm_i915_gem_object *bo;<br>
> -     struct i915_vma *vma;<br>
>       int ret;<br>
><br>
> -     if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
> -             return -ENODEV;<br>
> +     intel_runtime_pm_get(dev_priv)<wbr>;<br>
><br>
>       ret = i915_mutex_lock_interruptible(<wbr>&dev_priv->drm);<br>
>       if (ret)<br>
> -             return ret;<br>
> +             goto out;<br>
><br>
>       BUILD_BUG_ON_NOT_POWER_OF_2(<wbr>OA_BUFFER_SIZE);<br>
>       BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);<br>
><br>
>       bo = i915_gem_object_create(dev_<wbr>priv, OA_BUFFER_SIZE);<br>
>       if (IS_ERR(bo)) {<br>
> -             DRM_ERROR("Failed to allocate OA buffer\n");<br>
> +             DRM_ERROR("Failed to allocate i915 perf obj\n");<br>
>               ret = PTR_ERR(bo);<br>
>               goto unlock;<br>
>       }<br>
> @@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
>               goto err_unref;<br>
><br>
>       /* PreHSW required 512K alignment, HSW requires 16M */<br>
> -     vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
> -     if (IS_ERR(vma)) {<br>
> -             ret = PTR_ERR(vma);<br>
> +     *vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
> +     if (IS_ERR(*vma)) {<br>
> +             ret = PTR_ERR(*vma);<br>
>               goto err_unref;<br>
>       }<br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>vma = vma;<br>
><br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>vaddr =<br>
> -             i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
> -     if (IS_ERR(dev_priv->perf.oa.oa_<wbr>buffer.vaddr)) {<br>
> -             ret = PTR_ERR(dev_priv->perf.oa.oa_<wbr>buffer.vaddr);<br>
> +     *vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
> +     if (IS_ERR(*vaddr)) {<br>
> +             ret = PTR_ERR(*vaddr);<br>
>               goto err_unpin;<br>
>       }<br>
><br>
> -     dev_priv->perf.oa.ops.init_oa_<wbr>buffer(dev_priv);<br>
> -<br>
> -     DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",<br>
> -                      i915_ggtt_offset(dev_priv-><wbr>perf.oa.oa_buffer.vma),<br>
> -                      dev_priv->perf.oa.oa_buffer.<wbr>vaddr);<br>
> -<br>
>       goto unlock;<br>
><br>
>   err_unpin:<br>
> -     __i915_vma_unpin(vma);<br>
> +     i915_vma_unpin(*vma);<br>
><br>
>   err_unref:<br>
>       i915_gem_object_put(bo);<br>
><br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>vaddr = NULL;<br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>vma = NULL;<br>
> -<br>
>   unlock:<br>
>       mutex_unlock(&dev_priv->drm.<wbr>struct_mutex);<br>
> +out:<br>
> +     intel_runtime_pm_put(dev_priv)<wbr>;<br>
>       return ret;<br>
>   }<br>
><br>
> +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
> +{<br>
> +     struct i915_vma *vma;<br>
> +     u8 *vaddr;<br>
> +     int ret;<br>
> +<br>
> +     if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
> +             return -ENODEV;<br>
> +<br>
> +     ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
> +     if (ret)<br>
> +             return ret;<br>
> +<br>
> +     dev_priv->perf.oa.oa_buffer.<wbr>vma = vma;<br>
> +     dev_priv->perf.oa.oa_buffer.<wbr>vaddr = vaddr;<br>
> +<br>
> +     dev_priv->perf.oa.ops.init_oa_<wbr>buffer(dev_priv);<br>
> +<br>
> +     DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",<br>
> +                      i915_ggtt_offset(dev_priv-><wbr>perf.oa.oa_buffer.vma),<br>
> +                      dev_priv->perf.oa.oa_buffer.<wbr>vaddr);<br>
> +     return 0;<br>
> +}<br>
> +<br>
> +static int alloc_cs_buffer(struct i915_perf_stream *stream)<br>
> +{<br>
> +     struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> +     struct i915_vma *vma;<br>
> +     u8 *vaddr;<br>
> +     int ret;<br>
> +<br>
> +     if (WARN_ON(stream->cs_buffer.<wbr>vma))<br>
> +             return -ENODEV;<br>
> +<br>
> +     ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
> +     if (ret)<br>
> +             return ret;<br>
> +<br>
> +     stream->cs_buffer.vma = vma;<br>
> +     stream->cs_buffer.vaddr = vaddr;<br>
> +     if (WARN_ON(!list_empty(&stream-><wbr>cs_samples)))<br>
> +             INIT_LIST_HEAD(&stream->cs_<wbr>samples);<br>
> +<br>
> +     DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",<br>
> +                      i915_ggtt_offset(stream->cs_<wbr>buffer.vma),<br>
> +                      stream->cs_buffer.vaddr);<br>
> +<br>
> +     return 0;<br>
> +}<br>
> +<br>
>   static void config_oa_regs(struct drm_i915_private *dev_priv,<br>
>                          const struct i915_oa_reg *regs,<br>
>                          int n_regs)<br>
> @@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)<br>
><br>
>   static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
>   {<br>
> +     struct i915_perf_stream *stream;<br>
> +     struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
> +     int idx;<br>
> +<br>
>       /*<br>
>        * Reset buf pointers so we don't forward reports from before now.<br>
>        *<br>
> @@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
>        */<br>
>       gen7_init_oa_buffer(dev_priv);<br>
><br>
> -     if (dev_priv->perf.oa.exclusive_<wbr>stream->enabled) {<br>
> -             struct i915_gem_context *ctx =<br>
> -                     dev_priv->perf.oa.exclusive_<wbr>stream->ctx;<br>
> -             u32 ctx_id = dev_priv->perf.oa.specific_<wbr>ctx_id;<br>
> -<br>
> +     idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +     stream = srcu_dereference(engine-><wbr>exclusive_stream, &engine->perf_srcu);<br>
> +     if (stream->state != I915_PERF_STREAM_DISABLED) {<br>
> +             struct i915_gem_context *ctx = stream->ctx;<br>
> +             u32 ctx_id = engine->specific_ctx_id;<br>
>               bool periodic = dev_priv->perf.oa.periodic;<br>
>               u32 period_exponent = dev_priv->perf.oa.period_<wbr>exponent;<br>
>               u32 report_format = dev_priv->perf.oa.oa_buffer.<wbr>format;<br>
> @@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
>                          GEN7_OACONTROL_ENABLE);<br>
>       } else<br>
>               I915_WRITE(GEN7_OACONTROL, 0);<br>
> +     srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
>   }<br>
><br>
>   static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
> @@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
>   }<br>
><br>
>   /**<br>
> - * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream<br>
> - * @stream: An i915 perf stream opened for OA metrics<br>
> + * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream<br>
> + * @stream: An i915 perf stream opened for GPU metrics<br>
>    *<br>
>    * [Re]enables hardware periodic sampling according to the period configured<br>
>    * when opening the stream. This also starts a hrtimer that will periodically<br>
>    * check for data in the circular OA buffer for notifying userspace (e.g.<br>
>    * during a read() or poll()).<br>
>    */<br>
> -static void i915_oa_stream_enable(struct i915_perf_stream *stream)<br>
> +static void i915_perf_stream_enable(struct i915_perf_stream *stream)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
> -     dev_priv->perf.oa.ops.oa_<wbr>enable(dev_priv);<br>
> +     if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
> +             dev_priv->perf.oa.ops.oa_<wbr>enable(dev_priv);<br>
><br>
> -     if (dev_priv->perf.oa.periodic)<br>
> -             hrtimer_start(&dev_priv->perf.<wbr>oa.poll_check_timer,<br>
> +     if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
> +             hrtimer_start(&dev_priv->perf.<wbr>poll_check_timer,<br>
>                             ns_to_ktime(POLL_PERIOD),<br>
>                             HRTIMER_MODE_REL_PINNED);<br>
>   }<br>
> @@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)<br>
>   }<br>
><br>
>   /**<br>
> - * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream<br>
> - * @stream: An i915 perf stream opened for OA metrics<br>
> + * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream<br>
> + * @stream: An i915 perf stream opened for GPU metrics<br>
>    *<br>
>    * Stops the OA unit from periodically writing counter reports into the<br>
>    * circular OA buffer. This also stops the hrtimer that periodically checks for<br>
>    * data in the circular OA buffer, for notifying userspace.<br>
>    */<br>
> -static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
> +static void i915_perf_stream_disable(<wbr>struct i915_perf_stream *stream)<br>
>   {<br>
>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
><br>
> -     dev_priv->perf.oa.ops.oa_<wbr>disable(dev_priv);<br>
> +     if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
> +             hrtimer_cancel(&dev_priv-><wbr>perf.poll_check_timer);<br>
> +<br>
> +     if (stream->cs_mode)<br>
> +             i915_perf_stream_release_<wbr>samples(stream);<br>
><br>
> -     if (dev_priv->perf.oa.periodic)<br>
> -             hrtimer_cancel(&dev_priv-><wbr>perf.oa.poll_check_timer);<br>
> +     if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
> +             dev_priv->perf.oa.ops.oa_<wbr>disable(dev_priv);<br>
>   }<br>
><br>
> -static const struct i915_perf_stream_ops i915_oa_stream_ops = {<br>
> -     .destroy = i915_oa_stream_destroy,<br>
> -     .enable = i915_oa_stream_enable,<br>
> -     .disable = i915_oa_stream_disable,<br>
> -     .wait_unlocked = i915_oa_wait_unlocked,<br>
> -     .poll_wait = i915_oa_poll_wait,<br>
> -     .read = i915_oa_read,<br>
> +static const struct i915_perf_stream_ops perf_stream_ops = {<br>
</div></div><span class="gmail-">> +     .destroy = i915_perf_stream_destroy,<br>
> +     .enable = i915_perf_stream_enable,<br>
> +     .disable = i915_perf_stream_disable,<br>
> +     .wait_unlocked = i915_perf_stream_wait_<wbr>unlocked,<br>
> +     .poll_wait = i915_perf_stream_poll_wait,<br>
> +     .read = i915_perf_stream_read,<br>
> +     .emit_sample_capture = i915_perf_stream_emit_sample_<wbr>capture,<br>
>   };<br>
><br>
>   /**<br>
> - * i915_oa_stream_init - validate combined props for OA stream and init<br>
> + * i915_perf_stream_init - validate combined props for stream and init<br>
</span><span class="gmail-">>    * @stream: An i915 perf stream<br>
</span><span class="gmail-">>    * @param: The open parameters passed to `DRM_I915_PERF_OPEN`<br>
>    * @props: The property state that configures stream (individually validated)<br>
> @@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
>    * doesn't ensure that the combination necessarily makes sense.<br>
>    *<br>
>    * At this point it has been determined that userspace wants a stream of<br>
> - * OA metrics, but still we need to further validate the combined<br>
> + * perf metrics, but still we need to further validate the combined<br>
>    * properties are OK.<br>
>    *<br>
>    * If the configuration makes sense then we can allocate memory for<br>
> - * a circular OA buffer and apply the requested metric set configuration.<br>
> + * a circular perf buffer and apply the requested metric set configuration.<br>
</span>>    *<br>
<span class="gmail-">>    * Returns: zero on success or a negative error code.<br>
>    */<br>
> -static int i915_oa_stream_init(struct i915_perf_stream *stream,<br>
> +static int i915_perf_stream_init(struct i915_perf_stream *stream,<br>
>                              struct drm_i915_perf_open_param *param,<br>
>                              struct perf_open_properties *props)<br>
>   {<br>
</span><span class="gmail-">>       struct drm_i915_private *dev_priv = stream->dev_priv;<br>
</span><div><div class="gmail-h5">> -     int format_size;<br>
> +     bool require_oa_unit = props->sample_flags & (SAMPLE_OA_REPORT |<br>
> +                                                   SAMPLE_OA_SOURCE);<br>
> +     bool cs_sample_data = props->sample_flags & SAMPLE_OA_REPORT;<br>
> +     struct i915_perf_stream *curr_stream;<br>
> +     struct intel_engine_cs *engine = NULL;<br>
> +     int idx;<br>
>       int ret;<br>
><br>
> -     /* If the sysfs metrics/ directory wasn't registered for some<br>
> -      * reason then don't let userspace try their luck with config<br>
> -      * IDs<br>
> -      */<br>
> -     if (!dev_priv->perf.metrics_kobj) {<br>
> -             DRM_DEBUG("OA metrics weren't advertised via sysfs\n");<br>
> -             return -EINVAL;<br>
> -     }<br>
> -<br>
> -     if (!(props->sample_flags & SAMPLE_OA_REPORT)) {<br>
> -             DRM_DEBUG("Only OA report sampling supported\n");<br>
> -             return -EINVAL;<br>
> -     }<br>
> -<br>
> -     if (!dev_priv->perf.oa.ops.init_<wbr>oa_buffer) {<br>
> -             DRM_DEBUG("OA unit not supported\n");<br>
> -             return -ENODEV;<br>
> -     }<br>
> -<br>
> -     /* To avoid the complexity of having to accurately filter<br>
> -      * counter reports and marshal to the appropriate client<br>
> -      * we currently only allow exclusive access<br>
> -      */<br>
> -     if (dev_priv->perf.oa.exclusive_<wbr>stream) {<br>
> -             DRM_DEBUG("OA unit already in use\n");<br>
> -             return -EBUSY;<br>
> -     }<br>
> -<br>
> -     if (!props->metrics_set) {<br>
> -             DRM_DEBUG("OA metric set not specified\n");<br>
> -             return -EINVAL;<br>
> -     }<br>
> -<br>
> -     if (!props->oa_format) {<br>
> -             DRM_DEBUG("OA report format not specified\n");<br>
> -             return -EINVAL;<br>
> +     if ((props->sample_flags & SAMPLE_CTX_ID) && !props->cs_mode) {<br>
> +             if (IS_HASWELL(dev_priv)) {<br>
</div></div>> +                     DRM_ERROR("On HSW, context ID sampling only supported via command stream\n");<br>
> +                     return -EINVAL;<br>
> +             } else if (!i915.enable_execlists) {<br>
> +                     DRM_ERROR("On Gen8+ without execlists, context ID sampling only supported via command stream\n");<br>
> +                     return -EINVAL;<br>
> +             }<br>
>       }<br>
><br>
<span class="gmail-">>       /* We set up some ratelimit state to potentially throttle any _NOTES<br>
> @@ -2060,70 +2635,167 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,<br>
><br>
>       stream->sample_size = sizeof(struct drm_i915_perf_record_header);<br>
><br>
> -     format_size = dev_priv->perf.oa.oa_formats[<wbr>props->oa_format].size;<br>
> +     if (require_oa_unit) {<br>
> +             int format_size;<br>
><br>
> -     stream->sample_flags |= SAMPLE_OA_REPORT;<br>
> -     stream->sample_size += format_size;<br>
> +             /* If the sysfs metrics/ directory wasn't registered for some<br>
> +              * reason then don't let userspace try their luck with config<br>
> +              * IDs<br>
> +              */<br>
> +             if (!dev_priv->perf.metrics_kobj) {<br>
</span>> +                     DRM_DEBUG("OA metrics weren't advertised via sysfs\n");<br>
> +                     return -EINVAL;<br>
> +             }<br>
><br>
<span class="gmail-">> -     if (props->sample_flags & SAMPLE_OA_SOURCE) {<br>
> -             stream->sample_flags |= SAMPLE_OA_SOURCE;<br>
> -             stream->sample_size += 8;<br>
> -     }<br>
> +             if (!dev_priv->perf.oa.ops.init_<wbr>oa_buffer) {<br>
> +                     DRM_DEBUG("OA unit not supported\n");<br>
> +                     return -ENODEV;<br>
> +             }<br>
><br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>format_size = format_size;<br>
> -     if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.format_size == 0))<br>
> -             return -EINVAL;<br>
> +             if (!props->metrics_set) {<br>
</span>> +                     DRM_DEBUG("OA metric set not specified\n");<br>
<span class="gmail-">> +                     return -EINVAL;<br>
> +             }<br>
> +<br>
</span>> +             if (!props->oa_format) {<br>
> +                     DRM_DEBUG("OA report format not specified\n");<br>
<span class="gmail-">> +                     return -EINVAL;<br>
> +             }<br>
> +<br>
</span><span class="gmail-">> +             if (props->cs_mode && (props->engine != RCS)) {<br>
</span>> +                     DRM_ERROR("Command stream OA metrics only available via Render CS\n");<br>
<span class="gmail-">> +                     return -EINVAL;<br>
> +             }<br>
> +<br>
</span>> +             engine = dev_priv->engine[RCS];<br>
> +             stream->using_oa = true;<br>
<span class="gmail-">> +<br>
> +             idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
</span>> +             curr_stream = srcu_dereference(engine-><wbr>exclusive_stream,<br>
> +                                            &engine->perf_srcu);<br>
<span class="gmail-">> +             if (curr_stream) {<br>
> +                     DRM_ERROR("Stream already opened\n");<br>
> +                     ret = -EINVAL;<br>
> +                     goto err_enable;<br>
> +             }<br>
</span><span class="gmail-">> +             srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
> +<br>
</span><span class="gmail-">> +             format_size =<br>
> +                     dev_priv->perf.oa.oa_formats[<wbr>props->oa_format].size;<br>
> +<br>
> +             if (props->sample_flags & SAMPLE_OA_REPORT) {<br>
> +                     stream->sample_flags |= SAMPLE_OA_REPORT;<br>
> +                     stream->sample_size += format_size;<br>
> +             }<br>
> +<br>
> +             if (props->sample_flags & SAMPLE_OA_SOURCE) {<br>
> +                     if (!(props->sample_flags & SAMPLE_OA_REPORT)) {<br>
</span>> +                             DRM_ERROR("OA source type can't be sampled without OA report\n");<br>
> +                             return -EINVAL;<br>
> +                     }<br>
<span class="gmail-">> +                     stream->sample_flags |= SAMPLE_OA_SOURCE;<br>
> +                     stream->sample_size += 8;<br>
> +             }<br>
> +<br>
> +             dev_priv->perf.oa.oa_buffer.<wbr>format_size = format_size;<br>
> +             if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.format_size == 0))<br>
</span>> +                     return -EINVAL;<br>
> +<br>
<span class="gmail-">> +             dev_priv->perf.oa.oa_buffer.<wbr>format =<br>
> +                     dev_priv->perf.oa.oa_formats[<wbr>props->oa_format].format;<br>
> +<br>
> +             dev_priv->perf.oa.metrics_set = props->metrics_set;<br>
><br>
> -     dev_priv->perf.oa.oa_buffer.<wbr>format =<br>
> -             dev_priv->perf.oa.oa_formats[<wbr>props->oa_format].format;<br>
> +             dev_priv->perf.oa.periodic = props->oa_periodic;<br>
> +             if (dev_priv->perf.oa.periodic)<br>
> +                     dev_priv->perf.oa.period_<wbr>exponent =<br>
> +                             props->oa_period_exponent;<br>
><br>
> -     dev_priv->perf.oa.metrics_set = props->metrics_set;<br>
</span>> +             if (stream->ctx) {<br>
> +                     ret = oa_get_render_ctx_id(stream);<br>
<span class="gmail-">> +                     if (ret)<br>
> +                             return ret;<br>
> +             }<br>
><br>
</span><span class="gmail-">> -     dev_priv->perf.oa.periodic = props->oa_periodic;<br>
</span>> -     if (dev_priv->perf.oa.periodic)<br>
<span class="gmail-">> -             dev_priv->perf.oa.period_<wbr>exponent = props->oa_period_exponent;<br>
> +             /* PRM - observability performance counters:<br>
> +              *<br>
> +              *   OACONTROL, performance counter enable, note:<br>
> +              *<br>
> +              *   "When this bit is set, in order to have coherent counts,<br>
> +              *   RC6 power state and trunk clock gating must be disabled.<br>
> +              *   This can be achieved by programming MMIO registers as<br>
> +              *   0xA094=0 and 0xA090[31]=1"<br>
> +              *<br>
> +              *   In our case we are expecting that taking pm + FORCEWAKE<br>
> +              *   references will effectively disable RC6.<br>
> +              */<br>
> +             intel_runtime_pm_get(dev_priv)<wbr>;<br>
> +             intel_uncore_forcewake_get(<wbr>dev_priv, FORCEWAKE_ALL);<br>
><br>
</span>> -     if (stream->ctx) {<br>
<span class="gmail-">> -             ret = oa_get_render_ctx_id(stream);<br>
> +             ret = alloc_oa_buffer(dev_priv);<br>
</span><span class="gmail-">>               if (ret)<br>
> -                     return ret;<br>
</span><span class="gmail-">> +                     goto err_oa_buf_alloc;<br>
> +<br>
> +             ret = dev_priv->perf.oa.ops.enable_<wbr>metric_set(dev_priv);<br>
</span>> +             if (ret)<br>
<div><div class="gmail-h5">> +                     goto err_enable;<br>
>       }<br>
><br>
> -     /* PRM - observability performance counters:<br>
> -      *<br>
> -      *   OACONTROL, performance counter enable, note:<br>
> -      *<br>
> -      *   "When this bit is set, in order to have coherent counts,<br>
> -      *   RC6 power state and trunk clock gating must be disabled.<br>
> -      *   This can be achieved by programming MMIO registers as<br>
> -      *   0xA094=0 and 0xA090[31]=1"<br>
> -      *<br>
> -      *   In our case we are expecting that taking pm + FORCEWAKE<br>
> -      *   references will effectively disable RC6.<br>
> -      */<br>
> -     intel_runtime_pm_get(dev_priv)<wbr>;<br>
> -     intel_uncore_forcewake_get(<wbr>dev_priv, FORCEWAKE_ALL);<br>
> +     if (props->sample_flags & SAMPLE_CTX_ID) {<br>
> +             stream->sample_flags |= SAMPLE_CTX_ID;<br>
> +             stream->sample_size += 8;<br>
> +     }<br>
><br>
> -     ret = alloc_oa_buffer(dev_priv);<br>
> -     if (ret)<br>
> -             goto err_oa_buf_alloc;<br>
> +     if (props->cs_mode) {<br>
> +             if (!cs_sample_data) {<br>
> +                     DRM_ERROR("Stream engine given without requesting any CS data to sample\n");<br>
> +                     ret = -EINVAL;<br>
> +                     goto err_enable;<br>
> +             }<br>
><br>
> -     ret = dev_priv->perf.oa.ops.enable_<wbr>metric_set(dev_priv);<br>
> -     if (ret)<br>
> -             goto err_enable;<br>
> +             if (!(props->sample_flags & SAMPLE_CTX_ID)) {<br>
> +                     DRM_ERROR("Stream engine given without requesting any CS specific property\n");<br>
> +                     ret = -EINVAL;<br>
> +                     goto err_enable;<br>
> +             }<br>
><br>
> -     stream->ops = &i915_oa_stream_ops;<br>
> +             engine = dev_priv->engine[props-><wbr>engine];<br>
><br>
> -     dev_priv->perf.oa.exclusive_<wbr>stream = stream;<br>
</div></div><span class="gmail-">> +             idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
</span>> +             curr_stream = srcu_dereference(engine-><wbr>exclusive_stream,<br>
> +                                            &engine->perf_srcu);<br>
<span class="gmail-">> +             if (curr_stream) {<br>
> +                     DRM_ERROR("Stream already opened\n");<br>
> +                     ret = -EINVAL;<br>
> +                     goto err_enable;<br>
> +             }<br>
</span><span class="gmail-">> +             srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
> +<br>
</span><span class="gmail-">> +             INIT_LIST_HEAD(&stream->cs_<wbr>samples);<br>
> +             ret = alloc_cs_buffer(stream);<br>
</span>> +             if (ret)<br>
<span class="gmail-">> +                     goto err_enable;<br>
> +<br>
> +             stream->cs_mode = true;<br>
> +     }<br>
> +<br>
> +     init_waitqueue_head(&stream-><wbr>poll_wq);<br>
> +     stream->pollin = false;<br>
> +     stream->ops = &perf_stream_ops;<br>
> +     stream->engine = engine;<br>
> +     rcu_assign_pointer(engine-><wbr>exclusive_stream, stream);<br>
><br>
>       return 0;<br>
><br>
>   err_enable:<br>
> -     free_oa_buffer(dev_priv);<br>
> +     if (require_oa_unit)<br>
> +             free_oa_buffer(dev_priv);<br>
><br>
>   err_oa_buf_alloc:<br>
</span><span class="gmail-">> -     intel_uncore_forcewake_put(<wbr>dev_priv, FORCEWAKE_ALL);<br>
> -     intel_runtime_pm_put(dev_priv)<wbr>;<br>
</span>> +     if (require_oa_unit) {<br>
<span class="gmail-">> +             intel_uncore_forcewake_put(<wbr>dev_priv, FORCEWAKE_ALL);<br>
> +             intel_runtime_pm_put(dev_priv)<wbr>;<br>
</span><span class="gmail-">> +     }<br>
>       if (stream->ctx)<br>
>               oa_put_render_ctx_id(stream);<br>
><br>
> @@ -2219,7 +2891,7 @@ static ssize_t i915_perf_read(struct file *file,<br>
>        * disabled stream as an error. In particular it might otherwise lead<br>
>        * to a deadlock for blocking file descriptors...<br>
>        */<br>
> -     if (!stream->enabled)<br>
> +     if (stream->state == I915_PERF_STREAM_DISABLED)<br>
>               return -EIO;<br>
><br>
>       if (!(file->f_flags & O_NONBLOCK)) {<br>
> @@ -2254,25 +2926,32 @@ static ssize_t i915_perf_read(struct file *file,<br>
>        * effectively ensures we back off until the next hrtimer callback<br>
>        * before reporting another POLLIN event.<br>
>        */<br>
> -     if (ret >= 0 || ret == -EAGAIN) {<br>
</span><span class="gmail-">> -             /* Maybe make ->pollin per-stream state if we support multiple<br>
> -              * concurrent streams in the future.<br>
> -              */<br>
> -             dev_priv->perf.oa.pollin = false;<br>
</span><span class="gmail-">> -     }<br>
> +     if (ret >= 0 || ret == -EAGAIN)<br>
> +             stream->pollin = false;<br>
><br>
>       return ret;<br>
>   }<br>
><br>
> -static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)<br>
> +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)<br>
>   {<br>
> +     struct i915_perf_stream *stream;<br>
>       struct drm_i915_private *dev_priv =<br>
>               container_of(hrtimer, typeof(*dev_priv),<br>
> -                          perf.oa.poll_check_timer);<br>
> -<br>
> -     if (oa_buffer_check_unlocked(dev_<wbr>priv)) {<br>
> -             dev_priv->perf.oa.pollin = true;<br>
> -             wake_up(&dev_priv->perf.oa.<wbr>poll_wq);<br>
> +                          perf.poll_check_timer);<br>
> +     int idx;<br>
> +     struct intel_engine_cs *engine;<br>
> +     enum intel_engine_id id;<br>
> +<br>
> +     for_each_engine(engine, dev_priv, id) {<br>
</span><span class="gmail-">> +             idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +             stream = srcu_dereference(engine-><wbr>exclusive_stream,<br>
</span>> +                                       &engine->perf_srcu);<br>
<span class="gmail-">> +             if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&<br>
> +                 stream_have_data_unlocked(<wbr>stream)) {<br>
> +                     stream->pollin = true;<br>
> +                     wake_up(&stream->poll_wq);<br>
> +             }<br>
> +             srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
>       }<br>
><br>
>       hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));<br>
> @@ -2311,7 +2990,7 @@ static unsigned int i915_perf_poll_locked(struct drm_i915_private *dev_priv,<br>
>        * the hrtimer/oa_poll_check_timer_cb to notify us when there are<br>
>        * samples to read.<br>
>        */<br>
> -     if (dev_priv->perf.oa.pollin)<br>
> +     if (stream->pollin)<br>
>               events |= POLLIN;<br>
><br>
>       return events;<br>
> @@ -2355,14 +3034,16 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)<br>
>    */<br>
>   static void i915_perf_enable_locked(struct i915_perf_stream *stream)<br>
>   {<br>
> -     if (stream->enabled)<br>
</span><span class="gmail-">> +     if (stream->state != I915_PERF_STREAM_DISABLED)<br>
</span><div><div class="gmail-h5">>               return;<br>
><br>
>       /* Allow stream->ops->enable() to refer to this */<br>
> -     stream->enabled = true;<br>
> +     stream->state = I915_PERF_STREAM_ENABLE_IN_<wbr>PROGRESS;<br>
><br>
>       if (stream->ops->enable)<br>
>               stream->ops->enable(stream);<br>
> +<br>
> +     stream->state = I915_PERF_STREAM_ENABLED;<br>
>   }<br>
><br>
>   /**<br>
> @@ -2381,11 +3062,11 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)<br>
>    */<br>
>   static void i915_perf_disable_locked(<wbr>struct i915_perf_stream *stream)<br>
>   {<br>
> -     if (!stream->enabled)<br>
> +     if (stream->state != I915_PERF_STREAM_ENABLED)<br>
>               return;<br>
><br>
>       /* Allow stream->ops->disable() to refer to this */<br>
> -     stream->enabled = false;<br>
> +     stream->state = I915_PERF_STREAM_DISABLED;<br>
><br>
>       if (stream->ops->disable)<br>
>               stream->ops->disable(stream);<br>
> @@ -2457,14 +3138,12 @@ static long i915_perf_ioctl(struct file *file,<br>
>    */<br>
>   static void i915_perf_destroy_locked(<wbr>struct i915_perf_stream *stream)<br>
>   {<br>
> -     if (stream->enabled)<br>
> +     if (stream->state == I915_PERF_STREAM_ENABLED)<br>
>               i915_perf_disable_locked(<wbr>stream);<br>
><br>
>       if (stream->ops->destroy)<br>
>               stream->ops->destroy(stream);<br>
><br>
> -     list_del(&stream->link);<br>
> -<br>
>       if (stream->ctx)<br>
>               i915_gem_context_put(stream-><wbr>ctx);<br>
><br>
> @@ -2524,7 +3203,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)<br>
>    *<br>
>    * In the case where userspace is interested in OA unit metrics then further<br>
>    * config validation and stream initialization details will be handled by<br>
> - * i915_oa_stream_init(). The code here should only validate config state that<br>
> + * i915_perf_stream_init(). The code here should only validate config state that<br>
>    * will be relevant to all stream types / backends.<br>
</div></div>>    *<br>
<div><div class="gmail-h5">>    * Returns: zero on success or a negative error code.<br>
> @@ -2593,7 +3272,7 @@ static int i915_perf_release(struct inode *inode, struct file *file)<br>
>       stream->dev_priv = dev_priv;<br>
>       stream->ctx = specific_ctx;<br>
><br>
> -     ret = i915_oa_stream_init(stream, param, props);<br>
> +     ret = i915_perf_stream_init(stream, param, props);<br>
>       if (ret)<br>
>               goto err_alloc;<br>
><br>
> @@ -2606,8 +3285,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)<br>
>               goto err_flags;<br>
>       }<br>
><br>
> -     list_add(&stream->link, &dev_priv->perf.streams);<br>
> -<br>
>       if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)<br>
>               f_flags |= O_CLOEXEC;<br>
>       if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)<br>
> @@ -2625,7 +3302,6 @@ static int i915_perf_release(struct inode *inode, struct file *file)<br>
>       return stream_fd;<br>
><br>
>   err_open:<br>
> -     list_del(&stream->link);<br>
>   err_flags:<br>
>       if (stream->ops->destroy)<br>
>               stream->ops->destroy(stream);<br>
> @@ -2774,6 +3450,29 @@ static int read_properties_unlocked(<wbr>struct drm_i915_private *dev_priv,<br>
>               case DRM_I915_PERF_PROP_SAMPLE_OA_<wbr>SOURCE:<br>
>                       props->sample_flags |= SAMPLE_OA_SOURCE;<br>
>                       break;<br>
> +             case DRM_I915_PERF_PROP_ENGINE: {<br>
> +                             unsigned int user_ring_id =<br>
> +                                     value & I915_EXEC_RING_MASK;<br>
> +                             enum intel_engine_id engine;<br>
> +<br>
> +                             if (user_ring_id > I915_USER_RINGS)<br>
</div></div>> +                                     return -EINVAL;<br>
> +<br>
<span class="gmail-">> +                             /* XXX: Currently only RCS is supported.<br>
> +                              * Remove this check when support for other<br>
> +                              * engines is added<br>
> +                              */<br>
> +                             engine = user_ring_map[user_ring_id];<br>
> +                             if (engine != RCS)<br>
</span>> +                                     return -EINVAL;<br>
> +<br>
<span class="gmail-">> +                             props->cs_mode = true;<br>
> +                             props->engine = engine;<br>
> +                     }<br>
> +                     break;<br>
> +             case DRM_I915_PERF_PROP_SAMPLE_CTX_<wbr>ID:<br>
> +                     props->sample_flags |= SAMPLE_CTX_ID;<br>
> +                     break;<br>
>               case DRM_I915_PERF_PROP_MAX:<br>
>                       MISSING_CASE(id);<br>
>                       return -EINVAL;<br>
> @@ -3002,6 +3701,30 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv)<br>
>       {}<br>
>   };<br>
><br>
</span>> +void i915_perf_streams_mark_idle(<wbr>struct drm_i915_private *dev_priv)<br>
> +{<br>
> +     struct intel_engine_cs *engine;<br>
> +     struct i915_perf_stream *stream;<br>
> +     enum intel_engine_id id;<br>
> +     int idx;<br>
> +<br>
> +     for_each_engine(engine, dev_priv, id) {<br>
<span class="gmail-">> +             idx = srcu_read_lock(&engine->perf_<wbr>srcu);<br>
> +             stream = srcu_dereference(engine-><wbr>exclusive_stream,<br>
</span>> +                                       &engine->perf_srcu);<br>
> +             if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&<br>
> +                                     stream->cs_mode) {<br>
<span class="gmail-">> +                     struct reservation_object *resv =<br>
> +                                             stream->cs_buffer.vma->resv;<br>
</span>> +<br>
> +                     reservation_object_lock(resv, NULL);<br>
<span class="gmail-">> +                     reservation_object_add_excl_<wbr>fence(resv, NULL);<br>
> +                     reservation_object_unlock(<wbr>resv);<br>
> +             }<br>
</span><span class="gmail-">> +             srcu_read_unlock(&engine-><wbr>perf_srcu, idx);<br>
> +     }<br>
> +}<br>
</span><span class="gmail-">> +<br>
>   /**<br>
>    * i915_perf_init - initialize i915-perf state on module load<br>
</span><span class="gmail-">>    * @dev_priv: i915 device instance<br>
</span><div><div class="gmail-h5">> @@ -3125,12 +3848,10 @@ void i915_perf_init(struct drm_i915_private *dev_priv)<br>
>       }<br>
><br>
>       if (dev_priv->perf.oa.n_builtin_<wbr>sets) {<br>
> -             hrtimer_init(&dev_priv->perf.<wbr>oa.poll_check_timer,<br>
> +             hrtimer_init(&dev_priv->perf.<wbr>poll_check_timer,<br>
>                               CLOCK_MONOTONIC, HRTIMER_MODE_REL);<br>
> -             dev_priv->perf.oa.poll_check_<wbr>timer.function = oa_poll_check_timer_cb;<br>
> -             init_waitqueue_head(&dev_priv-<wbr>>perf.oa.poll_wq);<br>
> +             dev_priv->perf.poll_check_<wbr>timer.function = poll_check_timer_cb;<br>
><br>
> -             INIT_LIST_HEAD(&dev_priv-><wbr>perf.streams);<br>
>               mutex_init(&dev_priv->perf.<wbr>lock);<br>
>               spin_lock_init(&dev_priv-><wbr>perf.oa.oa_buffer.ptr_lock);<br>
><br>
> diff --git a/drivers/gpu/drm/i915/intel_<wbr>engine_cs.c b/drivers/gpu/drm/i915/intel_<wbr>engine_cs.c<br>
> index 9ab5969..1a2e843 100644<br>
> --- a/drivers/gpu/drm/i915/intel_<wbr>engine_cs.c<br>
> +++ b/drivers/gpu/drm/i915/intel_<wbr>engine_cs.c<br>
> @@ -317,6 +317,10 @@ int intel_engines_init(struct drm_i915_private *dev_priv)<br>
>                       goto cleanup;<br>
><br>
>               GEM_BUG_ON(!engine->submit_<wbr>request);<br>
> +<br>
> +             /* Perf stream related initialization for Engine */<br>
> +             rcu_assign_pointer(engine-><wbr>exclusive_stream, NULL);<br>
> +             init_srcu_struct(&engine-><wbr>perf_srcu);<br>
>       }<br>
><br>
>       return 0;<br>
> diff --git a/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.c b/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.c<br>
> index cdf084e..4333623 100644<br>
> --- a/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.c<br>
> +++ b/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.c<br>
> @@ -1622,6 +1622,8 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)<br>
><br>
>       intel_engine_cleanup_common(<wbr>engine);<br>
><br>
> +     cleanup_srcu_struct(&engine-><wbr>perf_srcu);<br>
> +<br>
>       dev_priv->engine[engine->id] = NULL;<br>
>       kfree(engine);<br>
>   }<br>
> diff --git a/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.h b/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.h<br>
> index d33c934..0ac8491 100644<br>
> --- a/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.h<br>
> +++ b/drivers/gpu/drm/i915/intel_<wbr>ringbuffer.h<br>
> @@ -441,6 +441,11 @@ struct intel_engine_cs {<br>
>        * certain bits to encode the command length in the header).<br>
>        */<br>
>       u32 (*get_cmd_length_mask)(u32 cmd_header);<br>
> +<br>
> +     /* Global per-engine stream */<br>
> +     struct srcu_struct perf_srcu;<br>
> +     struct i915_perf_stream __rcu *exclusive_stream;<br>
> +     u32 specific_ctx_id;<br>
>   };<br>
><br>
>   static inline unsigned int<br>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h<br>
> index a1314c5..768b1a5 100644<br>
> --- a/include/uapi/drm/i915_drm.h<br>
> +++ b/include/uapi/drm/i915_drm.h<br>
> @@ -1350,6 +1350,7 @@ enum drm_i915_oa_format {<br>
><br>
>   enum drm_i915_perf_sample_oa_source {<br>
>       I915_PERF_SAMPLE_OA_SOURCE_<wbr>OABUFFER,<br>
> +     I915_PERF_SAMPLE_OA_SOURCE_CS,<br>
>       I915_PERF_SAMPLE_OA_SOURCE_MAX  /* non-ABI */<br>
>   };<br>
><br>
> @@ -1394,6 +1395,19 @@ enum drm_i915_perf_property_id {<br>
>        */<br>
>       DRM_I915_PERF_PROP_SAMPLE_OA_<wbr>SOURCE,<br>
><br>
> +     /**<br>
> +      * The value of this property specifies the GPU engine for which<br>
> +      * the samples need to be collected. Specifying this property also<br>
> +      * implies the command stream based sample collection.<br>
> +      */<br>
> +     DRM_I915_PERF_PROP_ENGINE,<br>
</div></div>> +<br>
> +     /**<br>
<span class="gmail-im gmail-HOEnZb">> +      * The value of this property set to 1 requests inclusion of context ID<br>
> +      * in the perf sample data.<br>
> +      */<br>
> +     DRM_I915_PERF_PROP_SAMPLE_CTX_<wbr>ID,<br>
> +<br>
>       DRM_I915_PERF_PROP_MAX /* non-ABI */<br>
>   };<br>
><br>
> @@ -1460,6 +1474,7 @@ enum drm_i915_perf_record_type {<br>
>        *     struct drm_i915_perf_record_header header;<br>
>        *<br>
>        *     { u64 source; } && DRM_I915_PERF_PROP_SAMPLE_OA_<wbr>SOURCE<br>
> +      *     { u64 ctx_id; } && DRM_I915_PERF_PROP_SAMPLE_CTX_<wbr>ID<br>
>        *     { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA<br>
>        * };<br>
>        */<br>
<br>
<br>
</span><div class="gmail-HOEnZb"><div class="gmail-h5">______________________________<wbr>_________________<br>
Intel-gfx mailing list<br>
<a href="mailto:Intel-gfx@lists.freedesktop.org">Intel-gfx@lists.freedesktop.<wbr>org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/intel-gfx</a><br>
</div></div></blockquote></div><br></div></div>