[Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

Matthew Auld matthew.william.auld at gmail.com
Wed Nov 30 16:30:12 UTC 2016


On 29 November 2016 at 17:00, Robert Bragg <robert at sixbynine.org> wrote:
> This adds a 'Perf' section to i915.rst with the following sub sections:
> - Overview
> - Comparison with Core Perf
> - i915 Driver Entry Points
> - i915 Perf Stream
> - i915 Perf Observation Architecture Stream
> - All i915 Perf Internals
>
> Signed-off-by: Robert Bragg <robert at sixbynine.org>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>  Documentation/gpu/i915.rst       |  92 +++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h  | 151 ++++++++++++++++----
>  drivers/gpu/drm/i915/i915_perf.c | 289 +++++++++++++++++++++++++++++++++++----
>  3 files changed, 478 insertions(+), 54 deletions(-)
>
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 117d2ab..714bd4b 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -356,4 +356,96 @@ switch_mm
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
>     :doc: switch_mm tracepoint
>
> +Perf
> +====
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :doc: i915 Perf Overview
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :doc: i915 Perf History and Comparison with Core Perf
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :doc: i915 Perf File Operations
I see no such :DOC entry in i915_perf.c.

> +
> +i915 Driver Entry Points
> +------------------------
> +
> +This section covers the entrypoints exported outside of i915_perf.c to
> +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_init
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_fini
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_register
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_unregister
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_open_ioctl
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_release
I see no kernel doc for this.

> +
> +i915 Perf Stream
> +----------------
> +
> +This section covers the stream-semantics-agnostic structures and functions
> +for representing an i915 perf stream FD and associated file operations.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> +   :functions: i915_perf_stream
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> +   :functions: i915_perf_stream_ops
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: read_properties_unlocked
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_open_ioctl_unlocked
i915_perf_open_ioctl_locked, and there is no kernel doc for it.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_destroy_unlocked
i915_perf_destroy_locked, and there is no kernel doc for it.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_read
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_ioctl
I see no kernel doc for this.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_enable_unlocked
i915_perf_enable_locked, and there is no kernel doc for it.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_disable_unlocked
i915_perf_disable_locked, and there is no kernel doc for it.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_poll
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_perf_poll_unlocked
i915_perf_poll_locked, and there is no kernel doc for it

> +
> +i915 Perf Observation Architecture Stream
> +-----------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: OA_BUFFER_SIZE
hmm, but this is not a function, enum, union, struct or typedef, so
can we actually add kernel doc for preprocessor directives ? Also
OA_BUFFER_SIZE lacks the proper formatting to get picked up anyway.

> +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> +   :functions: i915_oa_ops
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_stream_init
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_read
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_stream_enable
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_stream_disable
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_wait_unlocked
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :functions: i915_oa_poll_wait
> +
> +All i915 Perf Internals
> +-----------------------
> +
> +This section simply includes all currently documented i915 perf internals, in
> +no particular order, but may include some more minor utilities or platform
> +specific details than found in the more high-level sections.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> +   :internal:
> +
>  .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1ec9619..9f92755 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1827,89 +1827,186 @@ struct i915_oa_reg {
>
>  struct i915_perf_stream;
>
> +/**
> + * struct i915_perf_stream_ops - the OPs to support a specific stream type
> + */
>  struct i915_perf_stream_ops {
> -       /* Enables the collection of HW samples, either in response to
> -        * I915_PERF_IOCTL_ENABLE or implicitly called when stream is
> -        * opened without I915_PERF_FLAG_DISABLED.
> +       /**
> +        * @enable: Enables the collection of HW samples, either in response to
> +        * I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened
> +        * without I915_PERF_FLAG_DISABLED.
>          */
>         void (*enable)(struct i915_perf_stream *stream);
>
> -       /* Disables the collection of HW samples, either in response to
> -        * I915_PERF_IOCTL_DISABLE or implicitly called before
> -        * destroying the stream.
> +       /**
> +        * @disable: Disables the collection of HW samples, either in response
> +        * to I915_PERF_IOCTL_DISABLE or implicitly called before destroying
> +        * the stream.
>          */
>         void (*disable)(struct i915_perf_stream *stream);
>
> -       /* Call poll_wait, passing a wait queue that will be woken
> +       /**
> +        * @poll_wait: Call poll_wait, passing a wait queue that will be woken
>          * once there is something ready to read() for the stream
>          */
>         void (*poll_wait)(struct i915_perf_stream *stream,
>                           struct file *file,
>                           poll_table *wait);
>
> -       /* For handling a blocking read, wait until there is something
> -        * to ready to read() for the stream. E.g. wait on the same
> +       /**
> +        * @wait_unlocked: For handling a blocking read, wait until there is
> +        * something to ready to read() for the stream. E.g. wait on the same
>          * wait queue that would be passed to poll_wait().
>          */
>         int (*wait_unlocked)(struct i915_perf_stream *stream);
>
> -       /* read - Copy buffered metrics as records to userspace
> -        * @buf: the userspace, destination buffer
> -        * @count: the number of bytes to copy, requested by userspace
> -        * @offset: zero at the start of the read, updated as the read
> -        *          proceeds, it represents how many bytes have been
> -        *          copied so far and the buffer offset for copying the
> -        *          next record.
> +       /**
> +        * @read: Copy buffered metrics as records to userspace
> +        * **buf**: the userspace, destination buffer
> +        * **count**: the number of bytes to copy, requested by userspace
> +        * **offset**: zero at the start of the read, updated as the read
> +        * proceeds, it represents how many bytes have been copied so far and
> +        * the buffer offset for copying the next record.
So there's no proper way of documenting the parameters of a function
pointer which happens to also be a member of a struct ? I guess you
would have to do a typedef, but then that's a bit ugly...

>          *
> -        * Copy as many buffered i915 perf samples and records for
> -        * this stream to userspace as will fit in the given buffer.
> +        * Copy as many buffered i915 perf samples and records for this stream
> +        * to userspace as will fit in the given buffer.
>          *
> -        * Only write complete records; returning -ENOSPC if there
> -        * isn't room for a complete record.
> +        * Only write complete records; returning -ENOSPC if there isn't room
> +        * for a complete record.
>          *
> -        * Return any error condition that results in a short read
> -        * such as -ENOSPC or -EFAULT, even though these may be
> -        * squashed before returning to userspace.
> +        * Return any error condition that results in a short read such as
> +        * -ENOSPC or -EFAULT, even though these may be squashed before
> +        * returning to userspace.
>          */
>         int (*read)(struct i915_perf_stream *stream,
>                     char __user *buf,
>                     size_t count,
>                     size_t *offset);
>
> -       /* Cleanup any stream specific resources.
> +       /**
> +        * @destroy: Cleanup any stream specific resources.
>          *
>          * The stream will always be disabled before this is called.
>          */
>         void (*destroy)(struct i915_perf_stream *stream);
>  };
>
> +/**
> + * struct i915_perf_stream - state for a single open stream FD
> + */
>  struct i915_perf_stream {
> +       /**
> +        * @dev_priv: i915 drm device
> +        */
>         struct drm_i915_private *dev_priv;
>
> +       /**
> +        * @link: Links the stream into ``&drm_i915_private->streams``
> +        */
>         struct list_head link;
>
> +       /**
> +        * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*`
> +        * properties given when opening a stream, representing the contents
> +        * of a single sample as read() by userspace.
> +        */
>         u32 sample_flags;
> +
> +       /**
> +        * @sample_size: Considering the configured contents of a sample
> +        * combined with the required header size, this is the total size
> +        * of a single sample record.
> +        */
>         int sample_size;
>
> +       /**
> +        * @ctx: %NULL if measuring system-wide across all contexts or a
> +        * specific context that is being monitored.
> +        */
>         struct i915_gem_context *ctx;
> +
> +       /**
> +        * @enabled: Whether the stream is currently enabled, considering
> +        * whether the stream was opened in a disabled state and based
> +        * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE` calls.
> +        */
>         bool enabled;
>
> +       /**
> +        * @ops: The callbacks providing the implementation of this specific
> +        * type of configured stream.
> +        */
>         const struct i915_perf_stream_ops *ops;
>  };
>
> +/**
> + * struct i915_oa_ops - Gen specific implementation of an OA unit stream
> + */
>  struct i915_oa_ops {
> +       /**
> +        * @init_oa_buffer: Resets the head and tail pointers of the
> +        * circular buffer for periodic OA reports.
> +        *
> +        * Called when first opening a stream for OA metrics, but also may be
> +        * called in response to an OA buffer overflow or other error
> +        * condition.
> +        *
> +        * Note it may be necessary to clear the full OA buffer here as part of
> +        * maintaining the invariable that new reports must be written to
> +        * zeroed memory for us to be able to reliable detect if an expected
> +        * report has not yet landed in memory.  (At least on Haswell the OA
> +        * buffer tail pointer is not synchronized with reports being visible
> +        * to the CPU)
> +        */
>         void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
> +
> +       /**
> +        * @enable_metric_set: Applies any MUX configuration to set up the
> +        * Boolean and Custom (B/C) counters that are part of the counter
> +        * reports being sampled. May apply system constraints such as
> +        * disabling EU clock gating as required.
> +        */
>         int (*enable_metric_set)(struct drm_i915_private *dev_priv);
> +
> +       /**
> +        * @disable_metric_set: Remove system constraints associated with using
> +        * the OA unit.
> +        */
>         void (*disable_metric_set)(struct drm_i915_private *dev_priv);
> +
> +       /**
> +        * @oa_enable: Enable periodic sampling
> +        */
>         void (*oa_enable)(struct drm_i915_private *dev_priv);
> +
> +       /**
> +        * @oa_disable: Disable periodic sampling
> +        */
>         void (*oa_disable)(struct drm_i915_private *dev_priv);
> -       void (*update_oacontrol)(struct drm_i915_private *dev_priv);
> -       void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
> -                                       u32 ctx_id);
> +
> +       /**
> +        * @read: Copy data from the circular OA buffer into a given userspace
> +        * buffer.
> +        */
>         int (*read)(struct i915_perf_stream *stream,
>                     char __user *buf,
>                     size_t count,
>                     size_t *offset);
> +
> +       /**
> +        * @oa_buffer_is_empty: Check if OA buffer empty (false positives OK)
> +        *
> +        * This is either called via fops or the poll check hrtimer (atomic
> +        * ctx) without any locks taken.
> +        *
> +        * It's safe to read OA config state here unlocked, assuming that this
> +        * is only called while the stream is enabled, while the global OA
> +        * configuration can't be modified.
> +        *
> +        * Efficiency is more important than avoiding some false positives
> +        * here, which will be handled gracefully - likely resulting in an
> +        * EAGAIN error for userspace.
> +        */
>         bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
>  };
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 68b7c27..0be8cef 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -26,7 +26,10 @@
>
>
>  /**
> - * DOC: i915 Perf, streaming API for GPU metrics
> + * DOC: i915 Perf Overview
> + *
> + * Overview
> + * --------
>   *
>   * Gen graphics supports a large number of performance counters that can help
>   * driver and application developers understand and optimize their use of the
> @@ -45,6 +48,13 @@
>   * privileges by default, unless changed via the dev.i915.perf_event_paranoid
>   * sysctl option.
>   *
> + */
> +
> +/**
> + * DOC: i915 Perf History and Comparison with Core Perf
> + *
> + * Comparison with Core Perf
> + * -------------------------
>   *
>   * The interface was initially inspired by the core Perf infrastructure but
>   * some notable differences are:
> @@ -75,8 +85,8 @@
>   * gets copied from the GPU mapped buffers to userspace buffers.
>   *
>   *
> - * Some notes regarding Linux Perf:
> - * --------------------------------
> + * Issues hit with first prototype based on Core Perf
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   *
>   * The first prototype of this driver was based on the core perf
>   * infrastructure, and while we did make that mostly work, with some changes to
> @@ -135,7 +145,7 @@
>   *   for combining with the side-band raw reports it captures using
>   *   MI_REPORT_PERF_COUNT commands.
>   *
> - *   _ As a side note on perf's grouping feature; there was also some concern
> + *   - As a side note on perf's grouping feature; there was also some concern
>   *     that using PERF_FORMAT_GROUP as a way to pack together counter values
>   *     would quite drastically inflate our sample sizes, which would likely
>   *     lower the effective sampling resolutions we could use when the available
> @@ -277,6 +287,20 @@ static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
>
>  #define SAMPLE_OA_REPORT      (1<<0)
>
> +/**
> + * struct perf_open_properties - for validated properties given to open a stream
> + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags
> + * @single_context: Whether a single or all gpu contexts should be monitored
> + * @ctx_handle: A gem ctx handle for use with @single_context
> + * @metrics_set: An ID for an OA unit metric set advertised via sysfs
> + * @oa_format: An OA unit HW report format
> + * @oa_periodic: Whether to enable periodic OA unit sampling
> + * @oa_period_exponent: The OA unit sampling period is derived from this
> + *
> + * As read_properties_unlocked() enumerates and validates the properties given
> + * to open a stream of metrics the configuration is built up in the structure
> + * which starts out zero initialized.
> + */
>  struct perf_open_properties {
>         u32 sample_flags;
>
> @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_pr
>  }
>
>  /**
> - * Appends a status record to a userspace read() buffer.
> + * append_oa_status - Appends a status record to a userspace read() buffer.
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @type: The kind of status to report to userspace
> + *
> + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_REPORT_LOST`)
> + * into the userspace read() buffer.
> + *
> + * The @buf @offset will only be updated on success.
> + *
> + * Returns: 0 on success, negative error code on failure.
>   */
>  static int append_oa_status(struct i915_perf_stream *stream,
>                             char __user *buf,
> @@ -336,7 +372,21 @@ static int append_oa_status(struct i915_perf_stream *stream,
>  }
>
>  /**
> - * Copies single OA report into userspace read() buffer.
> + * append_oa_sample - Copies single OA report into userspace read() buffer.
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + * @report: A single OA report to (optionally) include as part of the sample
> + *
> + * The contents of a sample are configured through `DRM_I915_PERF_PROP_SAMPLE_*`
> + * properties when opening a stream, tracked as `stream->sample_flags`. This
> + * function copies the requested components of a single sample to the given
> + * read() @buf.
> + *
> + * The @buf @offset will only be updated on success.
> + *
> + * Returns: 0 on success, negative error code on failure.
>   */
>  static int append_oa_sample(struct i915_perf_stream *stream,
>                             char __user *buf,
> @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   * @head_ptr: (inout): the current oa buffer cpu read position
>   * @tail: the current oa buffer gpu write position
>   *
> - * Returns 0 on success, negative error code on failure.
> - *
>   * Notably any error condition resulting in a short read (-ENOSPC or
>   * -EFAULT) will be returned even though one or more records may
>   * have been successfully copied. In this case it's up to the caller
> @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   * tail, so the head chases the tail?... If you think that's mad
>   * and back-to-front you're not alone, but this follows the
>   * Gen PRM naming convention.
> + *
> + * Returns: 0 on success, negative error code on failure.
>   */
>  static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>                                   char __user *buf,
> @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>         return ret;
>  }
>
> +/**
> + * gen7_oa_read - copy status records then buffered OA reports
> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + *
> + * Checks Gen 7 specific OA unit status registers and if necessary appends
> + * corresponding status records for userspace (such as for a buffer full
> + * condition) and then initiate appending any buffered OA reports.
> + *
> + * Updates @offset according to the number of bytes successfully copied into
> + * the userspace buffer.
> + *
> + * Returns: zero on success or a negative error code
> + */
>  static int gen7_oa_read(struct i915_perf_stream *stream,
>                         char __user *buf,
>                         size_t count,
> @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
>         return ret;
>  }
>
> +/**
> + * i915_oa_wait_unlocked - handles blocking IO until OA data available
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * Called when userspace tries to read() from a blocking stream FD opened
> + * for OA metrics. It waits until the hrtimer callback finds a non-empty
> + * OA buffer and wakes us.
> + *
> + * Note: it's acceptable to have this return with some false positives
> + * since any subsequent read handling will return EAGAIN if there isn't
s/EAGAIN/-EAGAIN/

> + * really data ready for userspace yet.
> + *
> + * Returns: zero on success or a negative error code
> + */
>  static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>  {
>         struct drm_i915_private *dev_priv = stream->dev_priv;
> @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>                                         !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv));
>  }
>
> +/**
> + * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> + * @stream: An i915-perf stream opened for OA metrics
> + * @file: An i915 perf stream file
> + * @wait: poll() state table
> + *
> + * For handling userspace polling on an i915 perf stream opened for OA metrics,
> + * this starts a poll_wait with the wait queue that our hrtimer callback wakes
> + * when it sees data ready to read in the circular OA buffer.
> + */
>  static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>                               struct file *file,
>                               poll_table *wait)
> @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct i915_perf_stream *stream,
>         poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
>  }
>
> +/**
> + * i915_oa_read - just calls through to ``&i915_oa_ops->read``
I've got no idea what ``&i915_oa_ops->read`` is supposed to generate,
but it spews out :c:type:`i915_oa_ops->read <i915_oa_ops>` for me...

> + * @stream: An i915-perf stream opened for OA metrics
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @offset: (inout): the current position for writing into @buf
> + *
> + * Updates @offset according to the number of bytes successfully copied into
> + * the userspace buffer.
> + *
> + * Returns: zero on success or a negative error code
> + */
>  static int i915_oa_read(struct i915_perf_stream *stream,
>                         char __user *buf,
>                         size_t count,
> @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream *stream,
>         return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
>  }
>
> -/* Determine the render context hw id, and ensure it remains fixed for the
> +/**
> + * oa_get_render_ctx_id - determine and hold ctx hw id
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * Determine the render context hw id, and ensure it remains fixed for the
>   * lifetime of the stream. This ensures that we don't have to worry about
>   * updating the context ID in OACONTROL on the fly.
> + *
> + * Returns: zero on success or a negative error code
>   */
>  static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>  {
> @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>         return ret;
>  }
>
> +/**
> + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases hold
> + * @stream: An i915-perf stream opened for OA metrics
> + *
> + * In case anything needed doing to ensure the context HW ID would remain valid
> + * for the lifetime of the stream, then that can be undone here.
> + */
>  static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>  {
>         struct drm_i915_private *dev_priv = stream->dev_priv;
> @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct drm_i915_private *dev_priv)
>         spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags);
>  }
>
> +/**
> + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream
> + * @stream: An i915 perf stream opened for OA metrics
> + *
> + * [Re]enables hardware periodic sampling according to the period configured
> + * when opening the stream. This also starts a hrtimer that will periodically
> + * check for data in the circular OA buffer for notifying userspace (e.g.
> + * during a read() or poll()).
> + */
>  static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>  {
>         struct drm_i915_private *dev_priv = stream->dev_priv;
> @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct drm_i915_private *dev_priv)
>         I915_WRITE(GEN7_OACONTROL, 0);
>  }
>
> +/**
> + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA stream
> + * @stream: An i915 perf stream opened for OA metrics
> + *
> + * Stops the OA unit from periodically writing counter reports into the
> + * circular OA buffer. This also stops the hrtimer that periodically checks for
> + * data in the circular OA buffer, for notifying userspace.
> + */
>  static void i915_oa_stream_disable(struct i915_perf_stream *stream)
>  {
>         struct drm_i915_private *dev_priv = stream->dev_priv;
> @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops i915_oa_stream_ops = {
>         .read = i915_oa_read,
>  };
>
> +/**
> + * i915_oa_stream_init - validate combined props for OA stream and init
> + * @stream: An i915 perf stream
> + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN`
> + * @props: The property state that configures stream (individually validated)
> + *
> + * While read_properties_unlocked() validates properties in isolation it
> + * doesn't ensure that the combination necessarily makes sense.
> + *
> + * At this point it has been determined that userspace wants a stream of
> + * OA metrics, but still we need to further validate the combined
> + * properties are OK.
> + *
> + * If the configuration makes sense then we can allocate memory for
> + * a circular OA buffer and apply the requested metric set configuration.
> + *
> + * Returns: zero on success or a negative error code.
> + */
>  static int i915_oa_stream_init(struct i915_perf_stream *stream,
>                                struct drm_i915_perf_open_param *param,
>                                struct perf_open_properties *props)
> @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>         return ret;
>  }
>
> +/**
> + * i915_perf_read_locked - wrap stream->ops->read with error notmalisation
notmalisation? normalisation? nationalisation?

> + * @stream: An i915 perf stream
> + * @file: An i915 perf stream file
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @ppos: (inout) file seek position (unused)
> + *
> + * Besides wrapping stream->ops->read() this provides a common place to ensure
This generates stream->ops->:c:func:read() for me...hmm, so maybe we
don't want the brackets ?

> + * that if we've successfully copied any data then reporting that takes
> + * precedence over any internal error status, so the data isn't lost.
> + *
> + * For example ret will be -ENOSPC whenever there is more buffered data than
> + * can be copied to userspace, but that's only interesting if we weren't able
> + * to copy some data because it implies the userspace buffer is too small to
> + * receive a single record (and we never split records).
> + *
> + * Another case with ret == -EFAULT is more of a grey area since it would seem
> + * like bad form for userspace to ask us to overrun its buffer, but the user
> + * knows best:
> + *
> + *   http://yarchive.net/comp/linux/partial_reads_writes.html
> + *
> + * Returns: The number of bytes copied or a negative error code on failure.
> + */
>  static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
>                                      struct file *file,
>                                      char __user *buf,
> @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
>         size_t offset = 0;
>         int ret = stream->ops->read(stream, buf, count, &offset);
>
> -       /* If we've successfully copied any data then reporting that
> -        * takes precedence over any internal error status, so the
> -        * data isn't lost.
> -        *
> -        * For example ret will be -ENOSPC whenever there is more
> -        * buffered data than can be copied to userspace, but that's
> -        * only interesting if we weren't able to copy some data
> -        * because it implies the userspace buffer is too small to
> -        * receive a single record (and we never split records).
> -        *
> -        * Another case with ret == -EFAULT is more of a grey area
> -        * since it would seem like bad form for userspace to ask us
> -        * to overrun its buffer, but the user knows best:
> -        *
> -        *   http://yarchive.net/comp/linux/partial_reads_writes.html
> -        */
>         return offset ?: (ret ?: -EAGAIN);
>  }
>
> +/**
> + * i915_perf_read - handles read() FOP for i915 perf stream FDs
> + * @file: An i915 perf stream file
> + * @buf: destination buffer given by userspace
> + * @count: the number of bytes userspace wants to read
> + * @ppos: (inout) file seek position (unused)
> + *
> + * The entry point for handling a read() on a stream file descriptor from
> + * userspace. Most of the work is left to the i915_perf_read_locked() and
> + * stream->op->read() but to save having stream implementations (of which
s/stream->op->read()/stream->ops->read

> + * we might have multiple later) we handle blocking read here.
> + *
> + * We can also consistently treat trying to read from a disable stream
s/disable/disabled

> + * as an IO error so implementations can assume the stream is enabled
> + * while reading.
> + *
> + * Returns: The number of bytes copied or a negative error code on failure.
> + */
>  static ssize_t i915_perf_read(struct file *file,
>                               char __user *buf,
>                               size_t count,
> @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
>         return ret;
>  }
>
> -/* Note we copy the properties from userspace outside of the i915 perf
> - * mutex to avoid an awkward lockdep with mmap_sem.
> +/**
> + * read_properties_unlocked - validate + copy userspace stream open properties
> + * @dev_priv: i915 device instance
> + * @uprops: The array of u64 key value pairs given by userspace
> + * @n_props: The number of key value pairs expected in @uprops
> + * @props: The stream configuration built up while validating properties
>   *
>   * Note this function only validates properties in isolation it doesn't
>   * validate that the combination of properties makes sense or that all
>   * properties necessary for a particular kind of stream have been set.
> + *
> + * Note that there currently aren't any ordering requirements for properties so
> + * we shouldn't validate or assume anything about ordering here. This doesn't
> + * rule out defining new properties with ordering requirements in the future.
>   */
>  static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>                                     u64 __user *uprops,
> @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>         return 0;
>  }
>
> +/**
> + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD
> + * @dev: drm device
> + * @data: ioctl data copied from userspace (unvalidated)
> + * @file: drm file
> + *
> + * Validates the stream open parameters given by userspace including flags
> + * and an array of u64 key, value pair properties.
> + *
> + * Very little is assumed up front about the nature of the stream being
> + * opened (for instance we don't assume it's for periodic OA unit metrics). An
> + * i915-perf stream is expected to be a suitable interface for other forms of
> + * buffered data written by the GPU besides periodic OA metrics.
> + *
> + * Note we copy the properties from userspace outside of the i915 perf
> + * mutex to avoid an awkward lockdep with mmap_sem.
> + *
> + * Return: A newly opened i915 Perf stream file descriptor or negative
> + * error code on failure.
> + */
>  int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>                          struct drm_file *file)
>  {
> @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>         return ret;
>  }
>
> +/**
> + * i915_perf_register - exposes i915-perf to userspace
> + * @dev_priv: i915 device instance
> + *
> + * In particular OA metric sets are advertised under a sysfs metrics/
> + * directory allowing userspace to enumerate valid IDs that can be
> + * used to open an i915-perf stream.
> + */
>  void i915_perf_register(struct drm_i915_private *dev_priv)
>  {
>         if (!IS_HASWELL(dev_priv))
> @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private *dev_priv)
>         mutex_unlock(&dev_priv->perf.lock);
>  }
>
> +/**
> + * i915_perf_register - hide i915-perf from userspace
i915_perf_unregister

> + * @dev_priv: i915 device instance
> + *
> + * i915-perf state cleanup is split up into an 'unregister' and
> + * 'deinit' phase where the interface is first hidden from
> + * userspace by i915_perf_unregister() before cleaning up
> + * remaining state in i915_perf_fini().
> + */
>  void i915_perf_unregister(struct drm_i915_private *dev_priv)
>  {
>         if (!IS_HASWELL(dev_priv))
> @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = {
>         {}
>  };
>
> +/**
> + * i915_perf_init - initialize i915-perf state on module load
> + * @dev_priv: i915 device instance
> + *
> + * Initializes state i915-perf state without exposing anything to userspace.
s/state i915-perf state/i915-perf state/

Other than that it all looks good.


More information about the Intel-gfx mailing list