<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 31, 2017 at 3:13 PM, Lionel Landwerlin <span dir="ltr"><<a href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 31/07/17 08:59, Sagar Arun Kamble wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
From: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
<br>
This patch introduces a framework to capture OA counter reports associated<br>
with Render command stream. We can then associate the reports captured<br>
through this mechanism with their corresponding context id's. This can be<br>
further extended to associate any other metadata information with the<br>
corresponding samples (since the association with Render command stream<br>
gives us the ability to capture these information while inserting the<br>
corresponding capture commands into the command stream).<br>
<br>
The OA reports generated in this way are associated with a corresponding<br>
workload, and thus can be used the delimit the workload (i.e. sample the<br>
counters at the workload boundaries), within an ongoing stream of periodic<br>
counter snapshots.<br>
<br>
There may be usecases wherein we need more than periodic OA capture mode<br>
which is supported currently. This mode is primarily used for two usecases:<br>
- Ability to capture system wide metrics, alongwith the ability to map<br>
the reports back to individual contexts (particularly for HSW).<br>
- Ability to inject tags for work, into the reports. This provides<br>
visibility into the multiple stages of work within single context.<br>
<br>
The userspace will be able to distinguish between the periodic and CS based<br>
OA reports by the virtue of source_info sample field.<br>
<br>
The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA<br>
counters, and is inserted at BB boundaries.<br>
The data thus captured will be stored in a separate buffer, which will<br>
be different from the buffer used otherwise for periodic OA capture mode.<br>
The metadata information pertaining to snapshot is maintained in a list,<br>
which also has offsets into the gem buffer object per captured snapshot.<br>
In order to track whether the gpu has completed processing the node,<br>
a field pertaining to corresponding gem request is added, which is tracked<br>
for completion of the command.<br>
<br>
Both periodic and CS based reports are associated with a single stream<br>
(corresponding to render engine), and it is expected to have the samples<br>
in the sequential order according to their timestamps. Now, since these<br>
reports are collected in separate buffers, these are merge sorted at the<br>
time of forwarding to userspace during the read call.<br>
<br>
v2: Aligning with the non-perf interface (custom drm ioctl based). Also,<br>
few related patches are squashed together for better readability<br>
<br>
v3: Updated perf sample capture emit hook name. Reserving space upfront<br>
in the ring for emitting sample capture commands and using<br>
req->fence.seqno for tracking samples. Added SRCU protection for streams.<br>
Changed the stream last_request tracking to resv object. (Chris)<br>
Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved<br>
stream to global per-engine structure. (Sagar)<br>
Update unpin and put in the free routines to i915_vma_unpin_and_release.<br>
Making use of perf stream cs_buffer vma resv instead of separate resv obj.<br>
Pruned perf stream vma resv during gem_idle. (Chris)<br>
Changed payload field ctx_id to u64 to keep all sample data aligned at 8<br>
bytes. (Lionel)<br>
stall/flush prior to sample capture is not added. Do we need to give this<br>
control to user to select whether to stall/flush at each sample?<br>
<br>
Signed-off-by: Sourab Gupta <<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>><br>
Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org" target="_blank">robert@sixbynine.org</a>><br>
Signed-off-by: Sagar Arun Kamble <<a href="mailto:sagar.a.kamble@intel.com" target="_blank">sagar.a.kamble@intel.com</a>><br>
---<br>
drivers/gpu/drm/i915/i915_drv.<wbr>h | 101 ++-<br>
drivers/gpu/drm/i915/i915_gem.<wbr>c | 1 +<br>
drivers/gpu/drm/i915/i915_gem_<wbr>execbuffer.c | 8 +<br>
drivers/gpu/drm/i915/i915_perf<wbr>.c | 1185 ++++++++++++++++++++++------<br>
drivers/gpu/drm/i915/intel_eng<wbr>ine_cs.c | 4 +<br>
drivers/gpu/drm/i915/intel_rin<wbr>gbuffer.c | 2 +<br>
drivers/gpu/drm/i915/intel_rin<wbr>gbuffer.h | 5 +<br>
include/uapi/drm/i915_drm.h | 15 +<br>
8 files changed, 1073 insertions(+), 248 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/i915/i915_dr<wbr>v.h b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
index 2c7456f..8b1cecf 100644<br>
--- a/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
+++ b/drivers/gpu/drm/i915/i915_dr<wbr>v.h<br>
@@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {<br>
* The stream will always be disabled before this is called.<br>
*/<br>
void (*destroy)(struct i915_perf_stream *stream);<br>
+<br>
+ /*<br>
+ * @emit_sample_capture: Emit the commands in the command streamer<br>
+ * for a particular gpu engine.<br>
+ *<br>
+ * The commands are inserted to capture the perf sample data at<br>
+ * specific points during workload execution, such as before and after<br>
+ * the batch buffer.<br>
+ */<br>
+ void (*emit_sample_capture)(struct i915_perf_stream *stream,<br>
+ struct drm_i915_gem_request *request,<br>
+ bool preallocate);<br>
+};<br>
+<br>
+enum i915_perf_stream_state {<br>
+ I915_PERF_STREAM_DISABLED,<br>
+ I915_PERF_STREAM_ENABLE_IN_PR<wbr>OGRESS,<br>
+ I915_PERF_STREAM_ENABLED,<br>
};<br>
/**<br>
@@ -1997,9 +2015,9 @@ struct i915_perf_stream {<br>
struct drm_i915_private *dev_priv;<br>
/**<br>
- * @link: Links the stream into ``&drm_i915_private->streams``<br>
+ * @engine: Engine to which this stream corresponds.<br>
*/<br>
- struct list_head link;<br>
+ struct intel_engine_cs *engine;<br>
/**<br>
* @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`<br>
@@ -2022,17 +2040,41 @@ struct i915_perf_stream {<br>
struct i915_gem_context *ctx;<br>
/**<br>
- * @enabled: Whether the stream is currently enabled, considering<br>
- * whether the stream was opened in a disabled state and based<br>
- * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.<br>
+ * @state: Current stream state, which can be either disabled, enabled,<br>
+ * or enable_in_progress, while considering whether the stream was<br>
+ * opened in a disabled state and based on `I915_PERF_IOCTL_ENABLE` and<br>
+ * `I915_PERF_IOCTL_DISABLE` calls.<br>
*/<br>
- bool enabled;<br>
+ enum i915_perf_stream_state state;<br>
+<br>
+ /**<br>
+ * @cs_mode: Whether command stream based perf sample collection is<br>
+ * enabled for this stream<br>
+ */<br>
+ bool cs_mode;<br>
+<br>
+ /**<br>
+ * @using_oa: Whether OA unit is in use for this particular stream<br>
+ */<br>
+ bool using_oa;<br>
/**<br>
* @ops: The callbacks providing the implementation of this specific<br>
* type of configured stream.<br>
*/<br>
const struct i915_perf_stream_ops *ops;<br>
+<br>
+ /* Command stream based perf data buffer */<br>
+ struct {<br>
+ struct i915_vma *vma;<br>
+ u8 *vaddr;<br>
+ } cs_buffer;<br>
+<br>
+ struct list_head cs_samples;<br>
+ spinlock_t cs_samples_lock;<br>
+<br>
+ wait_queue_head_t poll_wq;<br>
+ bool pollin;<br>
};<br>
/**<br>
@@ -2095,7 +2137,8 @@ struct i915_oa_ops {<br>
int (*read)(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
- size_t *offset);<br>
+ size_t *offset,<br>
+ u32 ts);<br>
/**<br>
* @oa_hw_tail_read: read the OA tail pointer register<br>
@@ -2107,6 +2150,36 @@ struct i915_oa_ops {<br>
u32 (*oa_hw_tail_read)(struct drm_i915_private *dev_priv);<br>
};<br>
+/*<br>
+ * i915_perf_cs_sample - Sample element to hold info about a single perf<br>
+ * sample data associated with a particular GPU command stream.<br>
+ */<br>
+struct i915_perf_cs_sample {<br>
+ /**<br>
+ * @link: Links the sample into ``&stream->cs_samples``<br>
+ */<br>
+ struct list_head link;<br>
+<br>
+ /**<br>
+ * @request: GEM request associated with the sample. The commands to<br>
+ * capture the perf metrics are inserted into the command streamer in<br>
+ * context of this request.<br>
+ */<br>
+ struct drm_i915_gem_request *request;<br>
+<br>
+ /**<br>
+ * @offset: Offset into ``&stream->cs_buffer``<br>
+ * where the perf metrics will be collected, when the commands inserted<br>
+ * into the command stream are executed by GPU.<br>
+ */<br>
+ u32 offset;<br>
+<br>
+ /**<br>
+ * @ctx_id: Context ID associated with this perf sample<br>
+ */<br>
+ u32 ctx_id;<br>
+};<br>
+<br>
struct intel_cdclk_state {<br>
unsigned int cdclk, vco, ref;<br>
};<br>
@@ -2431,17 +2504,10 @@ struct drm_i915_private {<br>
struct ctl_table_header *sysctl_header;<br>
struct mutex lock;<br>
- struct list_head streams;<br>
-<br>
- struct {<br>
- struct i915_perf_stream *exclusive_stream;<br>
- u32 specific_ctx_id;<br>
-<br>
- struct hrtimer poll_check_timer;<br>
- wait_queue_head_t poll_wq;<br>
- bool pollin;<br>
+ struct hrtimer poll_check_timer;<br>
+ struct {<br>
/**<br>
* For rate limiting any notifications of spurious<br>
* invalid OA reports<br>
@@ -3636,6 +3702,8 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,<br>
void i915_oa_init_reg_state(struct intel_engine_cs *engine,<br>
struct i915_gem_context *ctx,<br>
uint32_t *reg_state);<br>
+void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *req,<br>
+ bool preallocate);<br>
/* i915_gem_evict.c */<br>
int __must_check i915_gem_evict_something(struc<wbr>t i915_address_space *vm,<br>
@@ -3795,6 +3863,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,<br>
/* i915_perf.c */<br>
extern void i915_perf_init(struct drm_i915_private *dev_priv);<br>
extern void i915_perf_fini(struct drm_i915_private *dev_priv);<br>
+extern void i915_perf_streams_mark_idle(st<wbr>ruct drm_i915_private *dev_priv);<br>
extern void i915_perf_register(struct drm_i915_private *dev_priv);<br>
extern void i915_perf_unregister(struct drm_i915_private *dev_priv);<br>
diff --git a/drivers/gpu/drm/i915/i915_ge<wbr>m.c b/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
index 000a764..7b01548 100644<br>
--- a/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
+++ b/drivers/gpu/drm/i915/i915_ge<wbr>m.c<br>
@@ -3220,6 +3220,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)<br>
intel_engines_mark_idle(dev_pr<wbr>iv);<br>
i915_gem_timelines_mark_idle(d<wbr>ev_priv);<br>
+ i915_perf_streams_mark_idle(d<wbr>ev_priv);<br>
GEM_BUG_ON(!dev_priv->gt.awake<wbr>);<br>
dev_priv->gt.awake = false;<br>
diff --git a/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c b/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
index 5fa4476..bfe546b 100644<br>
--- a/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
+++ b/drivers/gpu/drm/i915/i915_ge<wbr>m_execbuffer.c<br>
@@ -1194,12 +1194,16 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,<br>
if (err)<br>
goto err_request;<br>
+ i915_perf_emit_sample_<wbr>capture(rq, true);<br>
+<br>
err = eb->engine->emit_bb_start(rq,<br>
batch->node.start, PAGE_SIZE,<br>
cache->gen > 5 ? 0 : I915_DISPATCH_SECURE);<br>
if (err)<br>
goto err_request;<br>
+ i915_perf_emit_sample_<wbr>capture(rq, false);<br>
+<br>
GEM_BUG_ON(!reservation_object<wbr>_test_signaled_rcu(batch-><wbr>resv, true));<br>
i915_vma_move_to_active(batch, rq, 0);<br>
reservation_object_lock(batch-<wbr>>resv, NULL);<br>
@@ -2029,6 +2033,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
return err;<br>
}<br>
+ i915_perf_emit_sample_<wbr>capture(eb->request, true);<br>
+<br>
err = eb->engine->emit_bb_start(eb-><wbr>request,<br>
eb->batch->node.start +<br>
eb->batch_start_offset,<br>
@@ -2037,6 +2043,8 @@ static int eb_submit(struct i915_execbuffer *eb)<br>
if (err)<br>
return err;<br>
+ i915_perf_emit_sample_<wbr>capture(eb->request, false);<br>
+<br>
return 0;<br>
}<br>
diff --git a/drivers/gpu/drm/i915/i915_pe<wbr>rf.c b/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
index b272653..57e1936 100644<br>
--- a/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
+++ b/drivers/gpu/drm/i915/i915_pe<wbr>rf.c<br>
@@ -193,6 +193,7 @@<br>
#include <linux/anon_inodes.h><br>
#include <linux/sizes.h><br>
+#include <linux/srcu.h><br>
#include "i915_drv.h"<br>
#include "i915_oa_hsw.h"<br>
@@ -288,6 +289,12 @@<br>
#define OAREPORT_REASON_CTX_SWITCH (1<<3)<br>
#define OAREPORT_REASON_CLK_RATIO (1<<5)<br>
+/* Data common to periodic and RCS based OA samples */<br>
+struct i915_perf_sample_data {<br>
+ u64 source;<br>
+ u64 ctx_id;<br>
+ const u8 *report;<br>
+};<br>
/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate<br>
*<br>
@@ -328,8 +335,19 @@<br>
[I915_OA_FORMAT_C4_B8] = { 7, 64 },<br>
};<br>
+/* Duplicated from similar static enum in i915_gem_execbuffer.c */<br>
+#define I915_USER_RINGS (4)<br>
+static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {<br>
+ [I915_EXEC_DEFAULT] = RCS,<br>
+ [I915_EXEC_RENDER] = RCS,<br>
+ [I915_EXEC_BLT] = BCS,<br>
+ [I915_EXEC_BSD] = VCS,<br>
+ [I915_EXEC_VEBOX] = VECS<br>
+};<br>
+<br>
#define SAMPLE_OA_REPORT (1<<0)<br>
#define SAMPLE_OA_SOURCE (1<<1)<br>
+#define SAMPLE_CTX_ID (1<<2)<br>
/**<br>
* struct perf_open_properties - for validated properties given to open a stream<br>
@@ -340,6 +358,9 @@<br>
* @oa_format: An OA unit HW report format<br>
* @oa_periodic: Whether to enable periodic OA unit sampling<br>
* @oa_period_exponent: The OA unit sampling period is derived from this<br>
+ * @cs_mode: Whether the stream is configured to enable collection of metrics<br>
+ * associated with command stream of a particular GPU engine<br>
+ * @engine: The GPU engine associated with the stream in case cs_mode is enabled<br>
*<br>
* As read_properties_unlocked() enumerates and validates the properties given<br>
* to open a stream of metrics the configuration is built up in the structure<br>
@@ -356,6 +377,10 @@ struct perf_open_properties {<br>
int oa_format;<br>
bool oa_periodic;<br>
int oa_period_exponent;<br>
+<br>
+ /* Command stream mode */<br>
+ bool cs_mode;<br>
+ enum intel_engine_id engine;<br>
};<br>
static u32 gen8_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
@@ -371,6 +396,266 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv)<br>
}<br>
/**<br>
+ * i915_perf_emit_sample_capture - Insert the commands to capture metrics into<br>
+ * the command stream of a GPU engine.<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ *<br>
+ * The function provides a hook through which the commands to capture perf<br>
+ * metrics, are inserted into the command stream of a GPU engine.<br>
+ */<br>
+void i915_perf_emit_sample_capture(<wbr>struct drm_i915_gem_request *request,<br>
+ bool preallocate)<br>
+{<br>
+ struct intel_engine_cs *engine = request->engine;<br>
+ struct drm_i915_private *dev_priv = engine->i915;<br>
+ struct i915_perf_stream *stream;<br>
+ int idx;<br>
+<br>
+ if (!dev_priv->perf.initialized)<br>
+ return;<br>
+<br>
+ idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+ stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+ if (stream && (stream->state == I915_PERF_STREAM_ENABLED) &&<br>
+ stream->cs_mode)<br>
+ stream->ops->emit_sample_capt<wbr>ure(stream, request,<br>
+ preallocate);<br>
+ srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
+}<br>
+<br>
+/**<br>
+ * release_perf_samples - Release old perf samples to make space for new<br>
+ * sample data.<br>
+ * @stream: Stream from which space is to be freed up.<br>
+ * @target_size: Space required to be freed up.<br>
+ *<br>
+ * We also dereference the associated request before deleting the sample.<br>
+ * Also, no need to check whether the commands associated with old samples<br>
+ * have been completed. This is because these sample entries are anyways going<br>
+ * to be replaced by a new sample, and gpu will eventually overwrite the buffer<br>
+ * contents, when the request associated with new sample completes.<br>
+ */<br>
+static void release_perf_samples(struct i915_perf_stream *stream,<br>
+ u32 target_size)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+ struct i915_perf_cs_sample *sample, *next;<br>
+ u32 sample_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
+ u32 size = 0;<br>
+<br>
+ list_for_each_entry_safe<br>
+ (sample, next, &stream->cs_samples, link) {<br>
+ size += sample_size;<br>
+ i915_gem_request_put(sample-><wbr>request);<br>
+ list_del(&sample->link);<br>
+ kfree(sample);<br>
+<br>
+ if (size >= target_size)<br>
+ break;<br>
+ }<br>
+}<br>
+<br>
+/**<br>
+ * insert_perf_sample - Insert a perf sample entry to the sample list.<br>
+ * @stream: Stream into which sample is to be inserted.<br>
+ * @sample: perf CS sample to be inserted into the list<br>
+ *<br>
+ * This function never fails, since it always manages to insert the sample.<br>
+ * If the space is exhausted in the buffer, it will remove the older<br>
+ * entries in order to make space.<br>
+ */<br>
+static void insert_perf_sample(struct i915_perf_stream *stream,<br>
+ struct i915_perf_cs_sample *sample)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+ struct i915_perf_cs_sample *first, *last;<br>
+ int max_offset = stream->cs_buffer.vma->obj->ba<wbr>se.size;<br>
+ u32 sample_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
+ unsigned long flags;<br>
+<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ if (list_empty(&stream->cs_sample<wbr>s)) {<br>
+ sample->offset = 0;<br>
+ list_add_tail(&sample->link, &stream->cs_samples);<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+ return;<br>
+ }<br>
+<br>
+ first = list_first_entry(&stream->cs_s<wbr>amples, typeof(*first),<br>
+ link);<br>
+ last = list_last_entry(&stream->cs_sa<wbr>mples, typeof(*last),<br>
+ link);<br>
+<br>
+ if (last->offset >= first->offset) {<br>
+ /* Sufficient space available at the end of buffer? */<br>
+ if (last->offset + 2*sample_size < max_offset)<br>
+ sample->offset = last->offset + sample_size;<br>
+ /*<br>
+ * Wraparound condition. Is sufficient space available at<br>
+ * beginning of buffer?<br>
+ */<br>
+ else if (sample_size < first->offset)<br>
+ sample->offset = 0;<br>
+ /* Insufficient space. Overwrite existing old entries */<br>
+ else {<br>
+ u32 target_size = sample_size - first->offset;<br>
+<br>
+ release_perf_samples(stream, target_size);<br>
+ sample->offset = 0;<br>
+ }<br>
+ } else {<br>
+ /* Sufficient space available? */<br>
+ if (last->offset + 2*sample_size < first->offset)<br>
+ sample->offset = last->offset + sample_size;<br>
+ /* Insufficient space. Overwrite existing old entries */<br>
+ else {<br>
+ u32 target_size = sample_size -<br>
+ (first->offset - last->offset -<br>
+ sample_size);<br>
+<br>
+ release_perf_samples(stream, target_size);<br>
+ sample->offset = last->offset + sample_size;<br>
+ }<br>
+ }<br>
+ list_add_tail(&sample->link, &stream->cs_samples);<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+}<br>
+<br>
+/**<br>
+ * i915_emit_oa_report_capture - Insert the commands to capture OA<br>
+ * reports metrics into the render command stream<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ * @offset: command stream buffer offset where the OA metrics need to be<br>
+ * collected<br>
+ */<br>
+static int i915_emit_oa_report_capture(<br>
+ struct drm_i915_gem_request *request,<br>
+ bool preallocate,<br>
+ u32 offset)<br>
+{<br>
+ struct drm_i915_private *dev_priv = request->i915;<br>
+ struct intel_engine_cs *engine = request->engine;<br>
+ struct i915_perf_stream *stream;<br>
+ u32 addr = 0;<br>
+ u32 cmd, len = 4, *cs;<br>
+ int idx;<br>
+<br>
+ idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+ stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+ addr = stream->cs_buffer.vma-><a href="http://node.st">node.st</a><wbr>art + offset;<br>
+ srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
+<br>
+ if (WARN_ON(addr & 0x3f)) {<br>
+ DRM_ERROR("OA buffer address not aligned to 64 byte\n");<br>
+ return -EINVAL;<br>
+ }<br>
+<br>
+ if (preallocate)<br>
+ request->reserved_space += len;<br>
+ else<br>
+ request->reserved_space -= len;<br>
+<br>
+ cs = intel_ring_begin(request, 4);<br>
+ if (IS_ERR(cs))<br>
+ return PTR_ERR(cs);<br>
+<br>
+ cmd = MI_REPORT_PERF_COUNT | (1<<0);<br>
+ if (INTEL_GEN(dev_priv) >= 8)<br>
+ cmd |= (2<<0);<br>
+<br>
+ *cs++ = cmd;<br>
+ *cs++ = addr | MI_REPORT_PERF_COUNT_GGTT;<br>
+ *cs++ = request->fence.seqno;<br>
+<br>
+ if (INTEL_GEN(dev_priv) >= 8)<br>
+ *cs++ = 0;<br>
+ else<br>
+ *cs++ = MI_NOOP;<br>
+<br>
+ intel_ring_advance(request, cs);<br>
+<br>
+ return 0;<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_emit_sample_c<wbr>apture - Insert the commands to capture perf<br>
+ * metrics into the GPU command stream<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
+ * @request: request in whose context the metrics are being collected.<br>
+ * @preallocate: allocate space in ring for related sample.<br>
+ */<br>
+static void i915_perf_stream_emit_sample_c<wbr>apture(<br>
+ struct i915_perf_stream *stream,<br>
+ struct drm_i915_gem_request *request,<br>
+ bool preallocate)<br>
+{<br>
+ struct reservation_object *resv = stream->cs_buffer.vma->resv;<br>
+ struct i915_perf_cs_sample *sample;<br>
+ unsigned long flags;<br>
+ int ret;<br>
+<br>
+ sample = kzalloc(sizeof(*sample), GFP_KERNEL);<br>
+ if (sample == NULL) {<br>
+ DRM_ERROR("Perf sample alloc failed\n");<br>
+ return;<br>
+ }<br>
+<br>
+ sample->request = i915_gem_request_get(request);<br>
+ sample->ctx_id = request->ctx->hw_id;<br>
+<br>
+ insert_perf_sample(stream, sample);<br>
+<br>
+ if (stream->sample_flags & SAMPLE_OA_REPORT) {<br>
+ ret = i915_emit_oa_report_capture(re<wbr>quest,<br>
+ preallocate,<br>
+ sample->offset);<br>
+ if (ret)<br>
+ goto err_unref;<br>
+ }<br>
+<br>
+ reservation_object_lock(resv, NULL);<br>
+ if (reservation_object_reserve_sh<wbr>ared(resv) == 0)<br>
+ reservation_object_add_<wbr>shared_fence(resv, &request->fence);<br>
+ reservation_object_unlock(res<wbr>v);<br>
+<br>
+ i915_vma_move_to_active(strea<wbr>m->cs_buffer.vma, request,<br>
+ EXEC_OBJECT_WRITE);<br>
+ return;<br>
+<br>
+err_unref:<br>
+ i915_gem_request_put(sample-><wbr>request);<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ list_del(&sample->link);<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+ kfree(sample);<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_release_sampl<wbr>es - Release the perf command stream samples<br>
+ * @stream: Stream from which sample are to be released.<br>
+ *<br>
+ * Note: The associated requests should be completed before releasing the<br>
+ * references here.<br>
+ */<br>
+static void i915_perf_stream_release_sampl<wbr>es(struct i915_perf_stream *stream)<br>
+{<br>
+ struct i915_perf_cs_sample *entry, *next;<br>
+ unsigned long flags;<br>
+<br>
+ list_for_each_entry_safe<br>
+ (entry, next, &stream->cs_samples, link) {<br>
+ i915_gem_request_put(entry->r<wbr>equest);<br>
+<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ list_del(&entry->link);<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+ kfree(entry);<br>
+ }<br>
+}<br>
+<br>
+/**<br>
* oa_buffer_check_unlocked - check for data and update tail ptr state<br>
* @dev_priv: i915 device instance<br>
*<br>
@@ -521,12 +806,13 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
}<br>
/**<br>
- * append_oa_sample - Copies single OA report into userspace read() buffer.<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * append_perf_sample - Copies single perf sample into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf samples<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
- * @report: A single OA report to (optionally) include as part of the sample<br>
+ * @data: perf sample data which contains (optionally) metrics configured<br>
+ * earlier when opening a stream<br>
*<br>
* The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`<br>
* properties when opening a stream, tracked as `stream->sample_flags`. This<br>
@@ -537,11 +823,11 @@ static int append_oa_status(struct i915_perf_stream *stream,<br>
*<br>
* Returns: 0 on success, negative error code on failure.<br>
*/<br>
-static int append_oa_sample(struct i915_perf_stream *stream,<br>
+static int append_perf_sample(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
size_t *offset,<br>
- const u8 *report)<br>
+ const struct i915_perf_sample_data *data)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -569,16 +855,21 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
* transition. These are considered as source 'OABUFFER'.<br>
*/<br>
if (sample_flags & SAMPLE_OA_SOURCE) {<br>
- u64 source = I915_PERF_SAMPLE_OA_SOURCE_OAB<wbr>UFFER;<br>
+ if (copy_to_user(buf, &data->source, 8))<br>
+ return -EFAULT;<br>
+ buf += 8;<br>
+ }<br>
- if (copy_to_user(buf, &source, 8))<br>
+ if (sample_flags & SAMPLE_CTX_ID) {<br>
+ if (copy_to_user(buf, &data->ctx_id, 8))<br>
return -EFAULT;<br>
buf += 8;<br>
}<br>
if (sample_flags & SAMPLE_OA_REPORT) {<br>
- if (copy_to_user(buf, report, report_size))<br>
+ if (copy_to_user(buf, data->report, report_size))<br>
return -EFAULT;<br>
+ buf += report_size;<br>
}<br>
(*offset) += header.size;<br>
@@ -587,11 +878,54 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
}<br>
/**<br>
+ * append_oa_buffer_sample - Copies single periodic OA report into userspace<br>
+ * read() buffer.<br>
+ * @stream: An i915-perf stream opened for OA metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ * @report: A single OA report to (optionally) include as part of the sample<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_oa_buffer_sample(struct i915_perf_stream *stream,<br>
+ char __user *buf, size_t count,<br>
+ size_t *offset, const u8 *report)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+ u32 sample_flags = stream->sample_flags;<br>
+ struct i915_perf_sample_data data = { 0 };<br>
+ u32 *report32 = (u32 *)report;<br>
+<br>
+ if (sample_flags & SAMPLE_OA_SOURCE)<br>
+ data.source = I915_PERF_SAMPLE_OA_SOURCE_OAB<wbr>UFFER;<br>
+<br>
+ if (sample_flags & SAMPLE_CTX_ID) {<br>
+ if (INTEL_INFO(dev_priv)->gen < 8)<br>
+ data.ctx_id = 0;<br>
+ else {<br>
+ /*<br>
+ * XXX: Just keep the lower 21 bits for now since I'm<br>
+ * not entirely sure if the HW touches any of the higher<br>
+ * bits in this field<br>
+ */<br>
+ data.ctx_id = report32[2] & 0x1fffff;<br>
+ }<br>
+ }<br>
+<br>
+ if (sample_flags & SAMPLE_OA_REPORT)<br>
+ data.report = report;<br>
+<br>
+ return append_perf_sample(stream, buf, count, offset, &data);<br>
+}<br>
+<br>
+/**<br>
* Copies all buffered OA reports into userspace read() buffer.<br>
* @stream: An i915-perf stream opened for OA metrics<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
*<br>
* Notably any error condition resulting in a short read (-%ENOSPC or<br>
* -%EFAULT) will be returned even though one or more records may<br>
@@ -609,7 +943,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,<br>
static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
- size_t *offset)<br>
+ size_t *offset,<br>
+ u32 ts)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -623,7 +958,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
u32 taken;<br>
int ret = 0;<br>
- if (WARN_ON(!stream->enabled))<br>
+ if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
return -EIO;<br>
spin_lock_irqsave(&dev_priv->p<wbr>erf.oa.oa_buffer.ptr_lock, flags);<br>
@@ -669,6 +1004,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
u32 *report32 = (void *)report;<br>
u32 ctx_id;<br>
u32 reason;<br>
+ u32 report_ts = report32[1];<br>
+<br>
+ /* Report timestamp should not exceed the given ts */<br>
+ if (report_ts > ts)<br>
+ break;<br>
/*<br>
* All the report sizes factor neatly into the buffer<br>
@@ -750,23 +1090,23 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
* switches since it's not-uncommon for periodic samples to<br>
* identify a switch before any 'context switch' report.<br>
*/<br>
- if (!dev_priv->perf.oa.exclusive_<wbr>stream->ctx ||<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id == ctx_id ||<br>
+ if (!stream->ctx ||<br>
+ stream->engine->specific_ctx_<wbr>id == ctx_id ||<br>
(dev_priv->perf.oa.oa_buffer.l<wbr>ast_ctx_id ==<br>
- dev_priv->perf.oa.specific_ctx<wbr>_id) ||<br>
+ stream->engine->specific_ctx_i<wbr>d) ||<br>
reason & OAREPORT_REASON_CTX_SWITCH) {<br>
/*<br>
* While filtering for a single context we avoid<br>
* leaking the IDs of other contexts.<br>
*/<br>
- if (dev_priv->perf.oa.exclusive_s<wbr>tream->ctx &&<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id != ctx_id) {<br>
+ if (stream->ctx &&<br>
+ stream->engine->specific_ctx_<wbr>id != ctx_id) {<br>
report32[2] = INVALID_CTX_ID;<br>
}<br>
- ret = append_oa_sample(stream, buf, count, offset,<br>
- report);<br>
+ ret = append_oa_buffer_sample(stream<wbr>, buf, count,<br>
+ offset, report);<br>
if (ret)<br>
break;<br>
@@ -807,6 +1147,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
*<br>
* Checks OA unit status registers and if necessary appends corresponding<br>
* status records for userspace (such as for a buffer full condition) and then<br>
@@ -824,7 +1165,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,<br>
static int gen8_oa_read(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
- size_t *offset)<br>
+ size_t *offset,<br>
+ u32 ts)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
u32 oastatus;<br>
@@ -877,7 +1219,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
oastatus & ~GEN8_OASTATUS_REPORT_LOST);<br>
}<br>
- return gen8_append_oa_reports(stream, buf, count, offset);<br>
+ return gen8_append_oa_reports(stream, buf, count, offset, ts);<br>
}<br>
/**<br>
@@ -886,6 +1228,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
*<br>
* Notably any error condition resulting in a short read (-%ENOSPC or<br>
* -%EFAULT) will be returned even though one or more records may<br>
@@ -903,7 +1246,8 @@ static int gen8_oa_read(struct i915_perf_stream *stream,<br>
static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
- size_t *offset)<br>
+ size_t *offset,<br>
+ u32 ts)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
int report_size = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat_size;<br>
@@ -917,7 +1261,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
u32 taken;<br>
int ret = 0;<br>
- if (WARN_ON(!stream->enabled))<br>
+ if (WARN_ON(stream->state != I915_PERF_STREAM_ENABLED))<br>
return -EIO;<br>
spin_lock_irqsave(&dev_priv->p<wbr>erf.oa.oa_buffer.ptr_lock, flags);<br>
@@ -984,7 +1328,12 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
continue;<br>
}<br>
- ret = append_oa_sample(stream, buf, count, offset, report);<br>
+ /* Report timestamp should not exceed the given ts */<br>
+ if (report32[1] > ts)<br>
+ break;<br>
+<br>
+ ret = append_oa_buffer_sample(stream<wbr>, buf, count, offset,<br>
+ report);<br>
if (ret)<br>
break;<br>
@@ -1022,6 +1371,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
+ * @ts: copy OA reports till this timestamp<br>
*<br>
* Checks Gen 7 specific OA unit status registers and if necessary appends<br>
* corresponding status records for userspace (such as for a buffer full<br>
@@ -1035,7 +1385,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,<br>
static int gen7_oa_read(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
- size_t *offset)<br>
+ size_t *offset,<br>
+ u32 ts)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
u32 oastatus1;<br>
@@ -1097,16 +1448,172 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
GEN7_OASTATUS1_REPORT_LOST;<br>
}<br>
- return gen7_append_oa_reports(stream, buf, count, offset);<br>
+ return gen7_append_oa_reports(stream, buf, count, offset, ts);<br>
+}<br>
+<br>
+/**<br>
+ * append_cs_buffer_sample - Copies single perf sample data associated with<br>
+ * GPU command stream, into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf CS metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ * @node: Sample data associated with perf metrics<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_cs_buffer_sample(struct i915_perf_stream *stream,<br>
+ char __user *buf,<br>
+ size_t count,<br>
+ size_t *offset,<br>
+ struct i915_perf_cs_sample *node)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+ struct i915_perf_sample_data data = { 0 };<br>
+ u32 sample_flags = stream->sample_flags;<br>
+ int ret = 0;<br>
+<br>
+ if (sample_flags & SAMPLE_OA_REPORT) {<br>
+ const u8 *report = stream->cs_buffer.vaddr + node->offset;<br>
+ u32 sample_ts = *(u32 *)(report + 4);<br>
+<br>
+ data.report = report;<br>
+<br>
+ /* First, append the periodic OA samples having lower<br>
+ * timestamp values<br>
+ */<br>
+ ret = dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset,<br>
+ sample_ts);<br>
+ if (ret)<br>
+ return ret;<br>
+ }<br>
+<br>
+ if (sample_flags & SAMPLE_OA_SOURCE)<br>
+ data.source = I915_PERF_SAMPLE_OA_SOURCE_CS;<br>
+<br>
+ if (sample_flags & SAMPLE_CTX_ID)<br>
+ data.ctx_id = node->ctx_id;<br>
+<br>
+ return append_perf_sample(stream, buf, count, offset, &data);<br>
}<br>
/**<br>
- * i915_oa_wait_unlocked - handles blocking IO until OA data available<br>
+ * append_cs_buffer_samples: Copies all command stream based perf samples<br>
+ * into userspace read() buffer.<br>
+ * @stream: An i915-perf stream opened for perf CS metrics<br>
+ * @buf: destination buffer given by userspace<br>
+ * @count: the number of bytes userspace wants to read<br>
+ * @offset: (inout): the current position for writing into @buf<br>
+ *<br>
+ * Notably any error condition resulting in a short read (-%ENOSPC or<br>
+ * -%EFAULT) will be returned even though one or more records may<br>
+ * have been successfully copied. In this case it's up to the caller<br>
+ * to decide if the error should be squashed before returning to<br>
+ * userspace.<br>
+ *<br>
+ * Returns: 0 on success, negative error code on failure.<br>
+ */<br>
+static int append_cs_buffer_samples(struc<wbr>t i915_perf_stream *stream,<br>
+ char __user *buf,<br>
+ size_t count,<br>
+ size_t *offset)<br>
+{<br>
+ struct i915_perf_cs_sample *entry, *next;<br>
+ LIST_HEAD(free_list);<br>
+ int ret = 0;<br>
+ unsigned long flags;<br>
+<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ if (list_empty(&stream->cs_sample<wbr>s)) {<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+ return 0;<br>
+ }<br>
+ list_for_each_entry_safe(entr<wbr>y, next,<br>
+ &stream->cs_samples, link) {<br>
+ if (!i915_gem_request_completed(e<wbr>ntry->request))<br>
+ break;<br>
+ list_move_tail(&entry->link, &free_list);<br>
+ }<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+ if (list_empty(&free_list))<br>
+ return 0;<br>
+<br>
+ list_for_each_entry_safe(entr<wbr>y, next, &free_list, link) {<br>
+ ret = append_cs_buffer_sample(stream<wbr>, buf, count, offset,<br>
+ entry);<br>
+ if (ret)<br>
+ break;<br>
+<br>
+ list_del(&entry->link);<br>
+ i915_gem_request_put(entry->r<wbr>equest);<br>
+ kfree(entry);<br>
+ }<br>
+<br>
+ /* Don't discard remaining entries, keep them for next read */<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ list_splice(&free_list, &stream->cs_samples);<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+ return ret;<br>
+}<br>
+<br>
+/*<br>
+ * cs_buffer_is_empty - Checks whether the command stream buffer<br>
+ * associated with the stream has data available.<br>
* @stream: An i915-perf stream opened for OA metrics<br>
*<br>
+ * Returns: true if atleast one request associated with command stream is<br>
+ * completed, else returns false.<br>
+ */<br>
+static bool cs_buffer_is_empty(struct i915_perf_stream *stream)<br>
+<br>
+{<br>
+ struct i915_perf_cs_sample *entry = NULL;<br>
+ struct drm_i915_gem_request *request = NULL;<br>
+ unsigned long flags;<br>
+<br>
+ spin_lock_irqsave(&stream-><wbr>cs_samples_lock, flags);<br>
+ entry = list_first_entry_or_null(&stre<wbr>am->cs_samples,<br>
+ struct i915_perf_cs_sample, link);<br>
+ if (entry)<br>
+ request = entry->request;<br>
+ spin_unlock_irqrestore(&strea<wbr>m->cs_samples_lock, flags);<br>
+<br>
+ if (!entry)<br>
+ return true;<br>
+ else if (!i915_gem_request_completed(r<wbr>equest))<br>
+ return true;<br>
+ else<br>
+ return false;<br>
+}<br>
+<br>
+/**<br>
+ * stream_have_data_unlocked - Checks whether the stream has data available<br>
+ * @stream: An i915-perf stream opened for OA metrics<br>
+ *<br>
+ * For command stream based streams, check if the command stream buffer has<br>
+ * atleast one sample available, if not return false, irrespective of periodic<br>
+ * oa buffer having the data or not.<br>
+ */<br>
+<br>
+static bool stream_have_data_unlocked(stru<wbr>ct i915_perf_stream *stream)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+<br>
+ if (stream->cs_mode)<br>
+ return !cs_buffer_is_empty(stream);<br>
+ else<br>
+ return oa_buffer_check_unlocked(dev_p<wbr>riv);<br>
+}<br>
+<br>
+/**<br>
+ * i915_perf_stream_wait_unlocked - handles blocking IO until data available<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
+ *<br>
* Called when userspace tries to read() from a blocking stream FD opened<br>
- * for OA metrics. It waits until the hrtimer callback finds a non-empty<br>
- * OA buffer and wakes us.<br>
+ * for perf metrics. It waits until the hrtimer callback finds a non-empty<br>
+ * command stream buffer / OA buffer and wakes us.<br>
*<br>
* Note: it's acceptable to have this return with some false positives<br>
* since any subsequent read handling will return -EAGAIN if there isn't<br>
@@ -1114,7 +1621,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,<br>
*<br>
* Returns: zero on success or a negative error code<br>
*/<br>
-static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
+static int i915_perf_stream_wait_unlocked<wbr>(struct i915_perf_stream *stream)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
@@ -1122,32 +1629,47 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)<br>
if (!dev_priv->perf.oa.periodic)<br>
return -EIO;<br>
- return wait_event_interruptible(dev_p<wbr>riv->perf.oa.poll_wq,<br>
- oa_buffer_check_unlocked(dev_<wbr>priv));<br>
+ if (stream->cs_mode) {<br>
+ long int ret;<br>
+<br>
+ /* Wait for the all sampled requests. */<br>
+ ret = reservation_object_wait_timeou<wbr>t_rcu(<br>
+ stream->cs_buffer.vma->resv,<br>
+ true,<br>
+ true,<br>
+ MAX_SCHEDULE_TIMEOUT);<br>
+ if (unlikely(ret < 0)) {<br>
+ DRM_DEBUG_DRIVER("Failed to wait for sampled requests: %li\n", ret);<br>
+ return ret;<br>
+ }<br>
+ }<br>
+<br>
+ return wait_event_interruptible(strea<wbr>m->poll_wq,<br>
+ stream_have_data_unlocked(str<wbr>eam));<br>
}<br>
/**<br>
- * i915_oa_poll_wait - call poll_wait() for an OA stream poll()<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * i915_perf_stream_poll_wait - call poll_wait() for an stream poll()<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
* @file: An i915 perf stream file<br>
* @wait: poll() state table<br>
*<br>
- * For handling userspace polling on an i915 perf stream opened for OA metrics,<br>
+ * For handling userspace polling on an i915 perf stream opened for metrics,<br>
* this starts a poll_wait with the wait queue that our hrtimer callback wakes<br>
- * when it sees data ready to read in the circular OA buffer.<br>
+ * when it sees data ready to read either in command stream buffer or in the<br>
+ * circular OA buffer.<br>
*/<br>
-static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
+static void i915_perf_stream_poll_wait(str<wbr>uct i915_perf_stream *stream,<br>
struct file *file,<br>
poll_table *wait)<br>
{<br>
- struct drm_i915_private *dev_priv = stream->dev_priv;<br>
-<br>
- poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);<br>
+ poll_wait(file, &stream->poll_wq, wait);<br>
}<br>
/**<br>
- * i915_oa_read - just calls through to &i915_oa_ops->read<br>
- * @stream: An i915-perf stream opened for OA metrics<br>
+ * i915_perf_stream_read - Reads perf metrics available into userspace read<br>
+ * buffer<br>
+ * @stream: An i915-perf stream opened for GPU metrics<br>
* @buf: destination buffer given by userspace<br>
* @count: the number of bytes userspace wants to read<br>
* @offset: (inout): the current position for writing into @buf<br>
@@ -1157,14 +1679,21 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,<br>
*<br>
* Returns: zero on success or a negative error code<br>
*/<br>
-static int i915_oa_read(struct i915_perf_stream *stream,<br>
+static int i915_perf_stream_read(struct i915_perf_stream *stream,<br>
char __user *buf,<br>
size_t count,<br>
size_t *offset)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
- return dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset);<br>
+<br>
</blockquote>
<br>
Does the following code mean that a perf stream is either in cs_mode or OA mode?<br>
I couldn't see that condition in the function processing the opening parameters.<br>
<br>
The comments in the patch description also says :<br>
<br>
"Both periodic and CS based reports are associated with a single stream"<br>
<br>
The following code seems to contradict that. Can you explain how it works?<br>
<br>
Thanks<br></blockquote><div><br></div><div>Hi Lionel,</div><div><br></div><div>If you look closely, append_cs_buffer_sample() function does merge sorting of </div><div>OA reports from two independent buffers (OA buffer which has the periodic OA</div><div>samples and Command stream buffer for RCS based OA reports). This is done on</div><div>the basis of the report timestamps.</div><div>Therefore, in the code below, if stream->cs_mode is enabled, that means the</div><div>append_cs_buffer_samples() function needs to be called which will take care of</div><div>collating the samples from these two independent buffers and copying to stream's</div><div>buffer in merge sort'ed order. If cs_mode is not enabled, we can simply collect</div><div>samples from periodic OA buffer and forward them to userspace (done via </div><div>perf.oa.ops.read() function).</div><div>Hope this addresses your question.</div><div><br></div><div>Regards,</div><div>Sourab</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ if (stream->cs_mode)<br>
+ return append_cs_buffer_samples(strea<wbr>m, buf, count, offset);<br>
+ else if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+ return dev_priv->perf.oa.ops.read(str<wbr>eam, buf, count, offset,<br>
+ U32_MAX);<br>
+ else<br>
+ return -EINVAL;<br>
}<br>
/**<br>
@@ -1182,7 +1711,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
if (i915.enable_execlists)<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id = stream->ctx->hw_id;<br>
+ stream->engine->specific_ctx_<wbr>id = stream->ctx->hw_id;<br>
else {<br>
struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
struct intel_ring *ring;<br>
@@ -1209,7 +1738,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)<br>
* i915_ggtt_offset() on the fly) considering the difference<br>
* with gen8+ and execlists<br>
*/<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id =<br>
+ stream->engine->specific_ctx_<wbr>id =<br>
i915_ggtt_offset(stream->ctx-><wbr>engine[engine->id].state);<br>
}<br>
@@ -1228,13 +1757,13 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
if (i915.enable_execlists) {<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id = INVALID_CTX_ID;<br>
+ stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
} else {<br>
struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
mutex_lock(&dev_priv->drm.stru<wbr>ct_mutex);<br>
- dev_priv->perf.oa.specific_ct<wbr>x_id = INVALID_CTX_ID;<br>
+ stream->engine->specific_ctx_<wbr>id = INVALID_CTX_ID;<br>
engine->context_unpin(engine, stream->ctx);<br>
mutex_unlock(&dev_priv-><a href="http://drm.st">drm.st</a><wbr>ruct_mutex);<br>
@@ -1242,13 +1771,28 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
}<br>
static void<br>
+free_cs_buffer(struct i915_perf_stream *stream)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+<br>
+ mutex_lock(&dev_priv->drm.str<wbr>uct_mutex);<br>
+<br>
+ i915_gem_object_unpin_map(str<wbr>eam->cs_buffer.vma->obj);<br>
+ i915_vma_unpin_and_release(&s<wbr>tream->cs_buffer.vma);<br>
+<br>
+ stream->cs_buffer.vma = NULL;<br>
+ stream->cs_buffer.vaddr = NULL;<br>
+<br>
+ mutex_unlock(&dev_priv->drm.s<wbr>truct_mutex);<br>
+}<br>
+<br>
+static void<br>
free_oa_buffer(struct drm_i915_private *i915)<br>
{<br>
mutex_lock(&i915->drm.struct_m<wbr>utex);<br>
i915_gem_object_unpin_map(i915<wbr>->perf.oa.oa_buffer.vma->obj);<br>
- i915_vma_unpin(i915->perf.oa.<wbr>oa_buffer.vma);<br>
- i915_gem_object_put(i915->per<wbr>f.oa.oa_buffer.vma->obj);<br>
+ i915_vma_unpin_and_release(&i<wbr>915->perf.oa.oa_buffer.vma);<br>
i915->perf.oa.oa_buffer.vma = NULL;<br>
i915->perf.oa.oa_buffer.vaddr = NULL;<br>
@@ -1256,27 +1800,41 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)<br>
mutex_unlock(&i915->drm.struct<wbr>_mutex);<br>
}<br>
-static void i915_oa_stream_destroy(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_destroy(struc<wbr>t i915_perf_stream *stream)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
-<br>
- BUG_ON(stream != dev_priv->perf.oa.exclusive_st<wbr>ream);<br>
+ struct intel_engine_cs *engine = stream->engine;<br>
+ struct i915_perf_stream *engine_stream;<br>
+ int idx;<br>
+<br>
+ idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+ engine_stream = srcu_dereference(engine->exclu<wbr>sive_stream,<br>
+ &engine->perf_srcu);<br>
+ if (WARN_ON(stream != engine_stream))<br>
+ return;<br>
+ srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
/*<br>
* Unset exclusive_stream first, it might be checked while<br>
* disabling the metric set on gen8+.<br>
*/<br>
- dev_priv->perf.oa.exclusive_s<wbr>tream = NULL;<br>
+ rcu_assign_pointer(stream->en<wbr>gine->exclusive_stream, NULL);<br>
+ synchronize_srcu(&stream->eng<wbr>ine->perf_srcu);<br>
- dev_priv->perf.oa.ops.<wbr>disable_metric_set(dev_priv);<br>
+ if (stream->using_oa) {<br>
+ dev_priv->perf.oa.ops.<wbr>disable_metric_set(dev_priv);<br>
- free_oa_buffer(dev_priv);<br>
+ free_oa_buffer(dev_priv);<br>
- intel_uncore_forcewake_put(de<wbr>v_priv, FORCEWAKE_ALL);<br>
- intel_runtime_pm_put(dev_<wbr>priv);<br>
+ intel_uncore_forcewake_put(de<wbr>v_priv, FORCEWAKE_ALL);<br>
+ intel_runtime_pm_put(dev_<wbr>priv);<br>
- if (stream->ctx)<br>
- oa_put_render_ctx_id(stream);<br>
+ if (stream->ctx)<br>
+ oa_put_render_ctx_id(stream);<br>
+ }<br>
+<br>
+ if (stream->cs_mode)<br>
+ free_cs_buffer(stream);<br>
if (dev_priv->perf.oa.spurious_re<wbr>port_rs.missed) {<br>
DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n",<br>
@@ -1325,11 +1883,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
* memory...<br>
*/<br>
memset(dev_priv->perf.oa.oa_bu<wbr>ffer.vaddr, 0, OA_BUFFER_SIZE);<br>
-<br>
- /* Maybe make ->pollin per-stream state if we support multiple<br>
- * concurrent streams in the future.<br>
- */<br>
- dev_priv->perf.oa.pollin = false;<br>
}<br>
static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
@@ -1383,33 +1936,26 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv)<br>
* memory...<br>
*/<br>
memset(dev_priv->perf.oa.oa_bu<wbr>ffer.vaddr, 0, OA_BUFFER_SIZE);<br>
-<br>
- /*<br>
- * Maybe make ->pollin per-stream state if we support multiple<br>
- * concurrent streams in the future.<br>
- */<br>
- dev_priv->perf.oa.pollin = false;<br>
}<br>
-static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
+static int alloc_obj(struct drm_i915_private *dev_priv,<br>
+ struct i915_vma **vma, u8 **vaddr)<br>
{<br>
struct drm_i915_gem_object *bo;<br>
- struct i915_vma *vma;<br>
int ret;<br>
- if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
- return -ENODEV;<br>
+ intel_runtime_pm_get(dev_<wbr>priv);<br>
ret = i915_mutex_lock_interruptible(<wbr>&dev_priv->drm);<br>
if (ret)<br>
- return ret;<br>
+ goto out;<br>
BUILD_BUG_ON_NOT_POWER_OF_2(OA<wbr>_BUFFER_SIZE);<br>
BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);<br>
bo = i915_gem_object_create(dev_pri<wbr>v, OA_BUFFER_SIZE);<br>
if (IS_ERR(bo)) {<br>
- DRM_ERROR("Failed to allocate OA buffer\n");<br>
+ DRM_ERROR("Failed to allocate i915 perf obj\n");<br>
ret = PTR_ERR(bo);<br>
goto unlock;<br>
}<br>
@@ -1419,42 +1965,83 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
goto err_unref;<br>
/* PreHSW required 512K alignment, HSW requires 16M */<br>
- vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
- if (IS_ERR(vma)) {<br>
- ret = PTR_ERR(vma);<br>
+ *vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);<br>
+ if (IS_ERR(*vma)) {<br>
+ ret = PTR_ERR(*vma);<br>
goto err_unref;<br>
}<br>
- dev_priv->perf.oa.oa_buffer.v<wbr>ma = vma;<br>
- dev_priv->perf.oa.oa_buffer.v<wbr>addr =<br>
- i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
- if (IS_ERR(dev_priv->perf.oa.oa_b<wbr>uffer.vaddr)) {<br>
- ret = PTR_ERR(dev_priv->perf.oa.oa_b<wbr>uffer.vaddr);<br>
+ *vaddr = i915_gem_object_pin_map(bo, I915_MAP_WB);<br>
+ if (IS_ERR(*vaddr)) {<br>
+ ret = PTR_ERR(*vaddr);<br>
goto err_unpin;<br>
}<br>
- dev_priv->perf.oa.ops.init_<wbr>oa_buffer(dev_priv);<br>
-<br>
- DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n",<br>
- i915_ggtt_offset(dev_priv->per<wbr>f.oa.oa_buffer.vma),<br>
- dev_priv-><a href="http://perf.oa.oa_buffer.va">perf.oa.oa_buffer.va</a><wbr>ddr);<br>
-<br>
goto unlock;<br>
err_unpin:<br>
- __i915_vma_unpin(vma);<br>
+ i915_vma_unpin(*vma);<br>
err_unref:<br>
i915_gem_object_put(bo);<br>
- dev_priv->perf.oa.oa_buffer.v<wbr>addr = NULL;<br>
- dev_priv->perf.oa.oa_buffer.v<wbr>ma = NULL;<br>
-<br>
unlock:<br>
mutex_unlock(&dev_priv-><a href="http://drm.st">drm.st</a><wbr>ruct_mutex);<br>
+out:<br>
+ intel_runtime_pm_put(dev_<wbr>priv);<br>
return ret;<br>
}<br>
+static int alloc_oa_buffer(struct drm_i915_private *dev_priv)<br>
+{<br>
+ struct i915_vma *vma;<br>
+ u8 *vaddr;<br>
+ int ret;<br>
+<br>
+ if (WARN_ON(dev_priv->perf.oa.oa_<wbr>buffer.vma))<br>
+ return -ENODEV;<br>
+<br>
+ ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
+ if (ret)<br>
+ return ret;<br>
+<br>
+ dev_priv->perf.oa.oa_buffer.v<wbr>ma = vma;<br>
+ dev_priv->perf.oa.oa_buffer.v<wbr>addr = vaddr;<br>
+<br>
+ dev_priv->perf.oa.ops.init_<wbr>oa_buffer(dev_priv);<br>
+<br>
+ DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",<br>
+ i915_ggtt_offset(dev_priv->per<wbr>f.oa.oa_buffer.vma),<br>
+ dev_priv-><a href="http://perf.oa.oa_buffer.va">perf.oa.oa_buffer.va</a><wbr>ddr);<br>
+ return 0;<br>
+}<br>
+<br>
+static int alloc_cs_buffer(struct i915_perf_stream *stream)<br>
+{<br>
+ struct drm_i915_private *dev_priv = stream->dev_priv;<br>
+ struct i915_vma *vma;<br>
+ u8 *vaddr;<br>
+ int ret;<br>
+<br>
+ if (WARN_ON(stream->cs_buffer.vma<wbr>))<br>
+ return -ENODEV;<br>
+<br>
+ ret = alloc_obj(dev_priv, &vma, &vaddr);<br>
+ if (ret)<br>
+ return ret;<br>
+<br>
+ stream->cs_buffer.vma = vma;<br>
+ stream->cs_buffer.vaddr = vaddr;<br>
+ if (WARN_ON(!list_empty(&stream-><wbr>cs_samples)))<br>
+ INIT_LIST_HEAD(&stream->cs_sa<wbr>mples);<br>
+<br>
+ DRM_DEBUG_DRIVER("Command stream buf initialized, gtt offset = 0x%x, vaddr = %p",<br>
+ i915_ggtt_offset(stream->cs_bu<wbr>ffer.vma),<br>
+ stream->cs_buffer.vaddr);<br>
+<br>
+ return 0;<br>
+}<br>
+<br>
static void config_oa_regs(struct drm_i915_private *dev_priv,<br>
const struct i915_oa_reg *regs,<br>
int n_regs)<br>
@@ -1859,6 +2446,10 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv)<br>
static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
{<br>
+ struct i915_perf_stream *stream;<br>
+ struct intel_engine_cs *engine = dev_priv->engine[RCS];<br>
+ int idx;<br>
+<br>
/*<br>
* Reset buf pointers so we don't forward reports from before now.<br>
*<br>
@@ -1870,11 +2461,11 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
*/<br>
gen7_init_oa_buffer(dev_priv);<br>
- if (dev_priv->perf.oa.exclusive_s<wbr>tream->enabled) {<br>
- struct i915_gem_context *ctx =<br>
- dev_priv->perf.oa.exclusive_s<wbr>tream->ctx;<br>
- u32 ctx_id = dev_priv->perf.oa.specific_ctx<wbr>_id;<br>
-<br>
+ idx = srcu_read_lock(&engine->perf_s<wbr>rcu);<br>
+ stream = srcu_dereference(engine->exclu<wbr>sive_stream, &engine->perf_srcu);<br>
+ if (stream->state != I915_PERF_STREAM_DISABLED) {<br>
+ struct i915_gem_context *ctx = stream->ctx;<br>
+ u32 ctx_id = engine->specific_ctx_id;<br>
bool periodic = dev_priv->perf.oa.periodic;<br>
u32 period_exponent = dev_priv->perf.oa.period_expon<wbr>ent;<br>
u32 report_format = dev_priv-><a href="http://perf.oa.oa_buffer.fo">perf.oa.oa_buffer.fo</a><wbr>rmat;<br>
@@ -1889,6 +2480,7 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)<br>
GEN7_OACONTROL_ENABLE);<br>
} else<br>
I915_WRITE(GEN7_OACONTROL, 0);<br>
+ srcu_read_unlock(&engine->per<wbr>f_srcu, idx);<br>
}<br>
static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
@@ -1917,22 +2509,23 @@ static void gen8_oa_enable(struct drm_i915_private *dev_priv)<br>
}<br>
/**<br>
- * i915_oa_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for OA stream<br>
- * @stream: An i915 perf stream opened for OA metrics<br>
+ * i915_perf_stream_enable - handle `I915_PERF_IOCTL_ENABLE` for perf stream<br>
+ * @stream: An i915 perf stream opened for GPU metrics<br>
*<br>
* [Re]enables hardware periodic sampling according to the period configured<br>
* when opening the stream. This also starts a hrtimer that will periodically<br>
* check for data in the circular OA buffer for notifying userspace (e.g.<br>
* during a read() or poll()).<br>
*/<br>
-static void i915_oa_stream_enable(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_enable(struct i915_perf_stream *stream)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
- dev_priv->perf.oa.ops.oa_enab<wbr>le(dev_priv);<br>
+ if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+ dev_priv->perf.oa.ops.oa_enab<wbr>le(dev_priv);<br>
- if (dev_priv->perf.oa.periodic)<br>
- hrtimer_start(&dev_priv-><wbr>perf.oa.poll_check_timer,<br>
+ if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
+ hrtimer_start(&dev_priv-><wbr>perf.poll_check_timer,<br>
ns_to_ktime(POLL_PERIOD),<br>
HRTIMER_MODE_REL_PINNED);<br>
}<br>
@@ -1948,34 +2541,39 @@ static void gen8_oa_disable(struct drm_i915_private *dev_priv)<br>
}<br>
/**<br>
- * i915_oa_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for OA stream<br>
- * @stream: An i915 perf stream opened for OA metrics<br>
+ * i915_perf_stream_disable - handle `I915_PERF_IOCTL_DISABLE` for perf stream<br>
+ * @stream: An i915 perf stream opened for GPU metrics<br>
*<br>
* Stops the OA unit from periodically writing counter reports into the<br>
* circular OA buffer. This also stops the hrtimer that periodically checks for<br>
* data in the circular OA buffer, for notifying userspace.<br>
*/<br>
-static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
+static void i915_perf_stream_disable(struc<wbr>t i915_perf_stream *stream)<br>
{<br>
struct drm_i915_private *dev_priv = stream->dev_priv;<br>
- dev_priv->perf.oa.ops.oa_disa<wbr>ble(dev_priv);<br>
+ if (stream->cs_mode || dev_priv->perf.oa.periodic)<br>
+ hrtimer_cancel(&dev_priv->per<wbr>f.poll_check_timer);<br>
+<br>
+ if (stream->cs_mode)<br>
+ i915_perf_stream_release_samp<wbr>les(stream);<br>
- if (dev_priv->perf.oa.periodic)<br>
- hrtimer_cancel(&dev_priv->per<wbr>f.oa.poll_check_timer);<br>
+ if (stream->sample_flags & SAMPLE_OA_REPORT)<br>
+ dev_priv->perf.oa.ops.oa_disa<wbr>ble(dev_priv);<br>
}<br>
-static const struct i915_perf_stream_ops i915_oa_stream_ops = {<br>
- .destroy = i915_oa_stream_destroy,<br>
- .enable = i915_oa_stream_enable,<br>
- .disable = i915_oa_stream_disable,<br>
- .wait_unlocked = i915_oa_wait_unlocked,<br>
- .poll_wait = i915_oa_poll_wait,<br>
- .read = i915_oa_read,<br>
+static const struct i915_perf_stream_ops perf_stream_ops = {<br>
+ .destroy = i915_perf_stream_destroy,<br>
+ .enable = i915_perf_stream_enable,<br>
+ .disable = i915_perf_stream_disable,<br>
+ .wait_unlocked = i915_perf_stream_wait_unlocked<wbr>,<br>
+ .poll_wait = i915_perf_stream_poll_wait,<br>
+ .read = i915_perf_stream_read,<br>
+ .emit_sample_capture = i915_perf_stream_emit_sample_c<wbr>apture,<br>
};<br>
/**<br>
- * i915_oa_stream_init - validate combined props for OA stream and init<br>
+ * i915_perf_stream_init - validate combined props for stream and init<br>
* @stream: An i915 perf stream<br>
* @param: The open parameters passed to `DRM_I915_PERF_OPEN`<br>
* @props: The property state that configures stream (individually validated)<br>
@@ -1984,58 +2582,35 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)<br>
* doesn't ensure that the combination necessarily makes sense.<br>
*<br>
* At this point it has been determined that userspace wants a stream of</blockquote>
</blockquote></div><br></div></div>