<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 4, 2016 at 8:59 AM, sourab gupta <span dir="ltr"><<a href="mailto:sourab.gupta@intel.com" target="_blank">sourab.gupta@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, 2016-10-27 at 19:14 -0700, Robert Bragg wrote:<br>
> Adds base i915 perf infrastructure for Gen performance metrics.<br>
><br>
> This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64<br>
> properties to configure a stream of metrics and returns a new fd usable<br>
> with standard VFS system calls including read() to read typed and sized<br>
> records; ioctl() to enable or disable capture and poll() to wait for<br>
> data.<br>
><br>
> A stream is opened something like:<br>
><br>
> uint64_t properties[] = {<br>
> /* Single context sampling */<br>
> DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle,<br>
><br>
> /* Include OA reports in samples */<br>
> DRM_I915_PERF_PROP_SAMPLE_OA, true,<br>
><br>
> /* OA unit configuration */<br>
> DRM_I915_PERF_PROP_OA_METRICS_<wbr>SET, metrics_set_id,<br>
> DRM_I915_PERF_PROP_OA_FORMAT, report_format,<br>
> DRM_I915_PERF_PROP_OA_<wbr>EXPONENT, period_exponent,<br>
> };<br>
> struct drm_i915_perf_open_param parm = {<br>
> .flags = I915_PERF_FLAG_FD_CLOEXEC |<br>
> I915_PERF_FLAG_FD_NONBLOCK |<br>
> I915_PERF_FLAG_DISABLED,<br>
> .properties_ptr = (uint64_t)properties,<br>
> .num_properties = sizeof(properties) / 16,<br>
> };<br>
> int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);<br>
><br>
> Records read all start with a common { type, size } header with<br>
> DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records<br>
> contain an extensible number of fields and it's the<br>
> DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that<br>
> determine what's included in every sample.<br>
><br>
> No specific streams are supported yet so any attempt to open a stream<br>
> will return an error.<br>
><br>
> v2:<br>
> use i915_gem_context_get() - Chris Wilson<br>
> v3:<br>
> update read() interface to avoid passing state struct - Chris Wilson<br>
> fix some rebase fallout, with i915-perf init/deinit<br>
> v4:<br>
> s/DRM_IORW/DRM_IOW/ - Emil Velikov<br>
><br>
> Signed-off-by: Robert Bragg <<a href="mailto:robert@sixbynine.org">robert@sixbynine.org</a>><br>
> ---<br>
> drivers/gpu/drm/i915/Makefile | 3 +<br>
> drivers/gpu/drm/i915/i915_drv.<wbr>c | 4 +<br>
> drivers/gpu/drm/i915/i915_drv.<wbr>h | 91 ++++++++<br>
> drivers/gpu/drm/i915/i915_<wbr>perf.c | 443 ++++++++++++++++++++++++++++++<wbr>+++++++++<br>
> include/uapi/drm/i915_drm.h | 67 ++++++<br>
> 5 files changed, 608 insertions(+)<br>
> create mode 100644 drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
><br>
> diff --git a/drivers/gpu/drm/i915/<wbr>Makefile b/drivers/gpu/drm/i915/<wbr>Makefile<br>
> index 6123400..8d4e25f 100644<br>
> --- a/drivers/gpu/drm/i915/<wbr>Makefile<br>
> +++ b/drivers/gpu/drm/i915/<wbr>Makefile<br>
> @@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_<wbr>CAPTURE_ERROR) += i915_gpu_error.o<br>
> # virtual gpu code<br>
> i915-y += i915_vgpu.o<br>
><br>
> +# perf code<br>
> +i915-y += i915_perf.o<br>
> +<br>
> ifeq ($(CONFIG_DRM_I915_GVT),y)<br>
> i915-y += intel_gvt.o<br>
> include $(src)/gvt/Makefile<br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>drv.c b/drivers/gpu/drm/i915/i915_<wbr>drv.c<br>
> index af3559d..685c96e 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>drv.c<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>drv.c<br>
> @@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,<br>
><br>
> intel_detect_preproduction_hw(<wbr>dev_priv);<br>
><br>
> + i915_perf_init(dev_priv);<br>
> +<br>
> return 0;<br>
><br>
> err_workqueues:<br>
> @@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,<br>
> */<br>
> static void i915_driver_cleanup_early(<wbr>struct drm_i915_private *dev_priv)<br>
> {<br>
> + i915_perf_fini(dev_priv);<br>
> i915_gem_load_cleanup(&dev_<wbr>priv->drm);<br>
> i915_workqueues_cleanup(dev_<wbr>priv);<br>
> }<br>
> @@ -2556,6 +2559,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {<br>
> DRM_IOCTL_DEF_DRV(I915_GEM_<wbr>USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW),<br>
> DRM_IOCTL_DEF_DRV(I915_GEM_<wbr>CONTEXT_GETPARAM, i915_gem_context_getparam_<wbr>ioctl, DRM_RENDER_ALLOW),<br>
> DRM_IOCTL_DEF_DRV(I915_GEM_<wbr>CONTEXT_SETPARAM, i915_gem_context_setparam_<wbr>ioctl, DRM_RENDER_ALLOW),<br>
> + DRM_IOCTL_DEF_DRV(I915_PERF_<wbr>OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW),<br>
> };<br>
><br>
> static struct drm_driver driver = {<br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>drv.h b/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> index 5a260db..7a65c0b 100644<br>
> --- a/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>drv.h<br>
> @@ -1767,6 +1767,84 @@ struct intel_wm_config {<br>
> bool sprites_scaled;<br>
> };<br>
><br>
> +struct i915_perf_stream;<br>
> +<br>
> +struct i915_perf_stream_ops {<br>
> + /* Enables the collection of HW samples, either in response to<br>
> + * I915_PERF_IOCTL_ENABLE or implicitly called when stream is<br>
> + * opened without I915_PERF_FLAG_DISABLED.<br>
> + */<br>
> + void (*enable)(struct i915_perf_stream *stream);<br>
> +<br>
> + /* Disables the collection of HW samples, either in response to<br>
> + * I915_PERF_IOCTL_DISABLE or implicitly called before<br>
> + * destroying the stream.<br>
> + */<br>
> + void (*disable)(struct i915_perf_stream *stream);<br>
> +<br>
> + /* Return: true if any i915 perf records are ready to read()<br>
> + * for this stream.<br>
> + */<br>
> + bool (*can_read)(struct i915_perf_stream *stream);<br>
> +<br>
> + /* Call poll_wait, passing a wait queue that will be woken<br>
> + * once there is something ready to read() for the stream<br>
> + */<br>
> + void (*poll_wait)(struct i915_perf_stream *stream,<br>
> + struct file *file,<br>
> + poll_table *wait);<br>
> +<br>
> + /* For handling a blocking read, wait until there is something<br>
> + * to ready to read() for the stream. E.g. wait on the same<br>
> + * wait queue that would be passed to poll_wait() until<br>
> + * ->can_read() returns true (if its safe to call ->can_read()<br>
> + * without the i915 perf lock held).<br>
> + */<br>
> + int (*wait_unlocked)(struct i915_perf_stream *stream);<br>
> +<br>
> + /* read - Copy buffered metrics as records to userspace<br>
> + * @buf: the userspace, destination buffer<br>
> + * @count: the number of bytes to copy, requested by userspace<br>
> + * @offset: zero at the start of the read, updated as the read<br>
> + * proceeds, it represents how many bytes have been<br>
> + * copied so far and the buffer offset for copying the<br>
> + * next record.<br>
> + *<br>
> + * Copy as many buffered i915 perf samples and records for<br>
> + * this stream to userspace as will fit in the given buffer.<br>
> + *<br>
> + * Only write complete records; returning -ENOSPC if there<br>
> + * isn't room for a complete record.<br>
> + *<br>
> + * Return any error condition that results in a short read<br>
> + * such as -ENOSPC or -EFAULT, even though these may be<br>
> + * squashed before returning to userspace.<br>
> + */<br>
> + int (*read)(struct i915_perf_stream *stream,<br>
> + char __user *buf,<br>
> + size_t count,<br>
> + size_t *offset);<br>
> +<br>
> + /* Cleanup any stream specific resources.<br>
> + *<br>
> + * The stream will always be disabled before this is called.<br>
> + */<br>
> + void (*destroy)(struct i915_perf_stream *stream);<br>
> +};<br>
> +<br>
> +struct i915_perf_stream {<br>
> + struct drm_i915_private *dev_priv;<br>
> +<br>
> + struct list_head link;<br>
> +<br>
> + u32 sample_flags;<br>
> +<br>
> + struct i915_gem_context *ctx;<br>
> + bool enabled;<br>
> +<br>
> + struct i915_perf_stream_ops *ops;<br>
> +};<br>
> +<br>
> struct drm_i915_private {<br>
> struct drm_device drm;<br>
><br>
> @@ -2069,6 +2147,12 @@ struct drm_i915_private {<br>
><br>
> struct i915_runtime_pm pm;<br>
><br>
> + struct {<br>
> + bool initialized;<br>
> + struct mutex lock;<br>
> + struct list_head streams;<br>
> + } perf;<br>
> +<br>
> /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */<br>
> struct {<br>
> void (*resume)(struct drm_i915_private *);<br>
> @@ -3482,6 +3566,9 @@ int i915_gem_context_setparam_<wbr>ioctl(struct drm_device *dev, void *data,<br>
> int i915_gem_context_reset_stats_<wbr>ioctl(struct drm_device *dev, void *data,<br>
> struct drm_file *file);<br>
><br>
> +int i915_perf_open_ioctl(struct drm_device *dev, void *data,<br>
> + struct drm_file *file);<br>
> +<br>
> /* i915_gem_evict.c */<br>
> int __must_check i915_gem_evict_something(<wbr>struct i915_address_space *vm,<br>
> u64 min_size, u64 alignment,<br>
> @@ -3607,6 +3694,10 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine,<br>
> u32 batch_len,<br>
> bool is_master);<br>
><br>
> +/* i915_perf.c */<br>
> +extern void i915_perf_init(struct drm_i915_private *dev_priv);<br>
> +extern void i915_perf_fini(struct drm_i915_private *dev_priv);<br>
> +<br>
> /* i915_suspend.c */<br>
> extern int i915_save_state(struct drm_device *dev);<br>
> extern int i915_restore_state(struct drm_device *dev);<br>
> diff --git a/drivers/gpu/drm/i915/i915_<wbr>perf.c b/drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
> new file mode 100644<br>
> index 0000000..c45cf92<br>
> --- /dev/null<br>
> +++ b/drivers/gpu/drm/i915/i915_<wbr>perf.c<br>
> @@ -0,0 +1,443 @@<br>
> +/*<br>
> + * Copyright © 2015-2016 Intel Corporation<br>
> + *<br>
> + * Permission is hereby granted, free of charge, to any person obtaining a<br>
> + * copy of this software and associated documentation files (the "Software"),<br>
> + * to deal in the Software without restriction, including without limitation<br>
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br>
> + * and/or sell copies of the Software, and to permit persons to whom the<br>
> + * Software is furnished to do so, subject to the following conditions:<br>
> + *<br>
> + * The above copyright notice and this permission notice (including the next<br>
> + * paragraph) shall be included in all copies or substantial portions of the<br>
> + * Software.<br>
> + *<br>
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br>
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br>
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL<br>
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br>
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br>
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br>
> + * IN THE SOFTWARE.<br>
> + *<br>
> + * Authors:<br>
> + * Robert Bragg <<a href="mailto:robert@sixbynine.org">robert@sixbynine.org</a>><br>
> + */<br>
> +<br>
> +#include <linux/anon_inodes.h><br>
> +<br>
> +#include "i915_drv.h"<br>
> +<br>
> +struct perf_open_properties {<br>
> + u32 sample_flags;<br>
> +<br>
> + u64 single_context:1;<br>
> + u64 ctx_handle;<br>
> +};<br>
> +<br>
> +static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,<br>
> + struct file *file,<br>
> + char __user *buf,<br>
> + size_t count,<br>
> + loff_t *ppos)<br>
> +{<br>
> + /* Note we keep the offset (aka bytes read) separate from any<br>
> + * error status so that the final check for whether we return<br>
> + * the bytes read with a higher precedence than any error (see<br>
> + * comment below) doesn't need to be handled/duplicated in<br>
> + * stream->ops->read() implementations.<br>
> + */<br>
> + size_t offset = 0;<br>
> + int ret = stream->ops->read(stream, buf, count, &offset);<br>
> +<br>
> + /* If we've successfully copied any data then reporting that<br>
> + * takes precedence over any internal error status, so the<br>
> + * data isn't lost.<br>
> + *<br>
> + * For example ret will be -ENOSPC whenever there is more<br>
> + * buffered data than can be copied to userspace, but that's<br>
> + * only interesting if we weren't able to copy some data<br>
> + * because it implies the userspace buffer is too small to<br>
> + * receive a single record (and we never split records).<br>
> + *<br>
> + * Another case with ret == -EFAULT is more of a grey area<br>
> + * since it would seem like bad form for userspace to ask us<br>
> + * to overrun its buffer, but the user knows best:<br>
> + *<br>
> + * <a href="http://yarchive.net/comp/linux/partial_reads_writes.html" rel="noreferrer" target="_blank">http://yarchive.net/comp/<wbr>linux/partial_reads_writes.<wbr>html</a><br>
> + */<br>
> + return offset ?: (ret ?: -EAGAIN);<br>
> +}<br>
> +<br>
> +static ssize_t i915_perf_read(struct file *file,<br>
> + char __user *buf,<br>
> + size_t count,<br>
> + loff_t *ppos)<br>
> +{<br>
> + struct i915_perf_stream *stream = file->private_data;<br>
> + struct drm_i915_private *dev_priv = stream->dev_priv;<br>
> + ssize_t ret;<br>
> +<br>
> + if (!(file->f_flags & O_NONBLOCK)) {<br>
> + /* Allow false positives from stream->ops->wait_unlocked.<br>
> + */<br>
> + do {<br>
> + ret = stream->ops->wait_unlocked(<wbr>stream);<br>
> + if (ret)<br>
> + return ret;<br>
> +<br>
> + mutex_lock(&dev_priv->perf.<wbr>lock);<br>
<br>
</div></div>Should interruptible version be used here, to allow for reads to be<br>
interrupted?<br></blockquote><div><br></div><div>Now that we don't have the context pin hook on haswell we could /almost/ get away without this lock except for its use to synchronize i915_perf_register with i915_perf_open_ioctl.<br><br></div><div>Most of the i915-perf state access is synchronized as a result of being fops driven, so this perf.lock was added to deal with a few entrypoints outside of fops such as the contect pinning hook we used to have (though we avoid it in the hrtimer callback).<br></div><div><br></div><div>Although the recent change to remove the pin hook has made the lock look a bit redundant for now, I think I'd prefer to leave the locks as they are to avoid the churn with the gen8+ patches where we do have some other entrypoints into i915-perf outside of the fops.<br><br></div><div>Given that though, there's currently not really much argument either way for them being interruptible. The expectation I have atm is that there shouldn't be anything running async within i915-perf outside of fops that's expected to be long running. We will probably also want to consider the risk of bouncing lots of reads, starving userspace and increasing the risk of a buffer overflow if this is interruptible.<br><br></div><div>- Robert<br></div></div></div></div>