[Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

Robert Bragg robert at sixbynine.org
Fri Dec 2 15:05:50 UTC 2016


On Nov 30, 2016 19:41, "Daniel Vetter" <daniel at ffwll.ch> wrote:
>
> On Tue, Nov 29, 2016 at 05:00:55PM +0000, Robert Bragg wrote:
> > This adds a 'Perf' section to i915.rst with the following sub sections:
> > - Overview
> > - Comparison with Core Perf
> > - i915 Driver Entry Points
> > - i915 Perf Stream
> > - i915 Perf Observation Architecture Stream
> > - All i915 Perf Internals
> >
> > Signed-off-by: Robert Bragg <robert at sixbynine.org>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>
> Two style bikesheds below, feel free to ignore.
>
> > ---
> >  Documentation/gpu/i915.rst       |  92 +++++++++++++
> >  drivers/gpu/drm/i915/i915_drv.h  | 151 ++++++++++++++++----
> >  drivers/gpu/drm/i915/i915_perf.c | 289 ++++++++++++++++++++++++++++++
+++++----
> >  3 files changed, 478 insertions(+), 54 deletions(-)
> >
> > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> > index 117d2ab..714bd4b 100644
> > --- a/Documentation/gpu/i915.rst
> > +++ b/Documentation/gpu/i915.rst
> > @@ -356,4 +356,96 @@ switch_mm
> >  .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
> >     :doc: switch_mm tracepoint
> >
> > +Perf
> > +====
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf Overview
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf History and Comparison with Core Perf
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf File Operations
>
> You have the headings in the DOC comments itself, which works until
> someone reorganizes stuff. Then it tends to fall apart badly.

Yeah, could be better.

>
> > +
> > +i915 Driver Entry Points
> > +------------------------
> > +
> > +This section covers the entrypoints exported outside of i915_perf.c to
> > +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_init
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_fini
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_register
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_unregister
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_open_ioctl
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_release
>
> One potential issue with listing everything explicitly is that if someone
> ever (and this is bound to happen) adds a new function, they'll forget to
> add it. Hence we just pull them all in, and if you want to refernce some
> specifically, do that in the overview sections. And also sprinkle lots of
> cross-references all over to make groups of functions easier to discover.
Without any structure it just didn't seem like documentation; just a dump
of internals info which didn't look that helpful to me.

There are some fairly noteworthy separations of responsibilities between
functions providing the i915 perf stream infrastructure and then the code
for OA unit streams, and then code specifically for Haswell. Splitting them
into sections seems worthwhile to me.

I don't think it's that big a deal listing symbols individually and
maintaining this when adding new symbols. Having maintained Cogl using
gtk-doc where symbols had to be explicitly listed in a -sections.txt file
to added them to the documentation, that wasn't too bad. It's true that
slip ups happen, but having control over the order symbols are presented
seems good to me. It could be nice if an ordered list could be passed to
:functions: to reduce how many times the corresponding perl script runs.

It could maybe be good if there was some way to tag/label symbols in
kerneldoc for selection in restructured text. It would also be nice if the
tooling understood what symbols were in the document so far, so it could
somehow be possible to have a dumping ground section for 'everything else'
at the end.

Regards,
- Robert

>
> But in the end your docs, your turf ;-)
> -Daniel
>
> > +
> > +i915 Perf Stream
> > +----------------
> > +
> > +This section covers the stream-semantics-agnostic structures and
functions
> > +for representing an i915 perf stream FD and associated file operations.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> > +   :functions: i915_perf_stream
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> > +   :functions: i915_perf_stream_ops
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: read_properties_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_open_ioctl_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_destroy_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_read
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_ioctl
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_enable_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_disable_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_poll
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_poll_unlocked
> > +
> > +i915 Perf Observation Architecture Stream
> > +-----------------------------------------
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: OA_BUFFER_SIZE
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> > +   :functions: i915_oa_ops
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_stream_init
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_read
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_stream_enable
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_stream_disable
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_wait_unlocked
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_oa_poll_wait
> > +
> > +All i915 Perf Internals
> > +-----------------------
> > +
> > +This section simply includes all currently documented i915 perf
internals, in
> > +no particular order, but may include some more minor utilities or
platform
> > +specific details than found in the more high-level sections.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :internal:
> > +
> >  .. WARNING: DOCPROC directive not supported:
!Cdrivers/gpu/drm/i915/i915_irq.c
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
b/drivers/gpu/drm/i915/i915_drv.h
> > index 1ec9619..9f92755 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1827,89 +1827,186 @@ struct i915_oa_reg {
> >
> >  struct i915_perf_stream;
> >
> > +/**
> > + * struct i915_perf_stream_ops - the OPs to support a specific stream
type
> > + */
> >  struct i915_perf_stream_ops {
> > -     /* Enables the collection of HW samples, either in response to
> > -      * I915_PERF_IOCTL_ENABLE or implicitly called when stream is
> > -      * opened without I915_PERF_FLAG_DISABLED.
> > +     /**
> > +      * @enable: Enables the collection of HW samples, either in
response to
> > +      * I915_PERF_IOCTL_ENABLE or implicitly called when stream is
opened
> > +      * without I915_PERF_FLAG_DISABLED.
> >        */
> >       void (*enable)(struct i915_perf_stream *stream);
> >
> > -     /* Disables the collection of HW samples, either in response to
> > -      * I915_PERF_IOCTL_DISABLE or implicitly called before
> > -      * destroying the stream.
> > +     /**
> > +      * @disable: Disables the collection of HW samples, either in
response
> > +      * to I915_PERF_IOCTL_DISABLE or implicitly called before
destroying
> > +      * the stream.
> >        */
> >       void (*disable)(struct i915_perf_stream *stream);
> >
> > -     /* Call poll_wait, passing a wait queue that will be woken
> > +     /**
> > +      * @poll_wait: Call poll_wait, passing a wait queue that will be
woken
> >        * once there is something ready to read() for the stream
> >        */
> >       void (*poll_wait)(struct i915_perf_stream *stream,
> >                         struct file *file,
> >                         poll_table *wait);
> >
> > -     /* For handling a blocking read, wait until there is something
> > -      * to ready to read() for the stream. E.g. wait on the same
> > +     /**
> > +      * @wait_unlocked: For handling a blocking read, wait until there
is
> > +      * something to ready to read() for the stream. E.g. wait on the
same
> >        * wait queue that would be passed to poll_wait().
> >        */
> >       int (*wait_unlocked)(struct i915_perf_stream *stream);
> >
> > -     /* read - Copy buffered metrics as records to userspace
> > -      * @buf: the userspace, destination buffer
> > -      * @count: the number of bytes to copy, requested by userspace
> > -      * @offset: zero at the start of the read, updated as the read
> > -      *          proceeds, it represents how many bytes have been
> > -      *          copied so far and the buffer offset for copying the
> > -      *          next record.
> > +     /**
> > +      * @read: Copy buffered metrics as records to userspace
> > +      * **buf**: the userspace, destination buffer
> > +      * **count**: the number of bytes to copy, requested by userspace
> > +      * **offset**: zero at the start of the read, updated as the read
> > +      * proceeds, it represents how many bytes have been copied so far
and
> > +      * the buffer offset for copying the next record.
> >        *
> > -      * Copy as many buffered i915 perf samples and records for
> > -      * this stream to userspace as will fit in the given buffer.
> > +      * Copy as many buffered i915 perf samples and records for this
stream
> > +      * to userspace as will fit in the given buffer.
> >        *
> > -      * Only write complete records; returning -ENOSPC if there
> > -      * isn't room for a complete record.
> > +      * Only write complete records; returning -ENOSPC if there isn't
room
> > +      * for a complete record.
> >        *
> > -      * Return any error condition that results in a short read
> > -      * such as -ENOSPC or -EFAULT, even though these may be
> > -      * squashed before returning to userspace.
> > +      * Return any error condition that results in a short read such as
> > +      * -ENOSPC or -EFAULT, even though these may be squashed before
> > +      * returning to userspace.
> >        */
> >       int (*read)(struct i915_perf_stream *stream,
> >                   char __user *buf,
> >                   size_t count,
> >                   size_t *offset);
> >
> > -     /* Cleanup any stream specific resources.
> > +     /**
> > +      * @destroy: Cleanup any stream specific resources.
> >        *
> >        * The stream will always be disabled before this is called.
> >        */
> >       void (*destroy)(struct i915_perf_stream *stream);
> >  };
> >
> > +/**
> > + * struct i915_perf_stream - state for a single open stream FD
> > + */
> >  struct i915_perf_stream {
> > +     /**
> > +      * @dev_priv: i915 drm device
> > +      */
> >       struct drm_i915_private *dev_priv;
> >
> > +     /**
> > +      * @link: Links the stream into ``&drm_i915_private->streams``
> > +      */
> >       struct list_head link;
> >
> > +     /**
> > +      * @sample_flags: Flags representing the
`DRM_I915_PERF_PROP_SAMPLE_*`
> > +      * properties given when opening a stream, representing the
contents
> > +      * of a single sample as read() by userspace.
> > +      */
> >       u32 sample_flags;
> > +
> > +     /**
> > +      * @sample_size: Considering the configured contents of a sample
> > +      * combined with the required header size, this is the total size
> > +      * of a single sample record.
> > +      */
> >       int sample_size;
> >
> > +     /**
> > +      * @ctx: %NULL if measuring system-wide across all contexts or a
> > +      * specific context that is being monitored.
> > +      */
> >       struct i915_gem_context *ctx;
> > +
> > +     /**
> > +      * @enabled: Whether the stream is currently enabled, considering
> > +      * whether the stream was opened in a disabled state and based
> > +      * on `I915_PERF_IOCTL_ENABLE` and `I915_PERF_IOCTL_DISABLE`
calls.
> > +      */
> >       bool enabled;
> >
> > +     /**
> > +      * @ops: The callbacks providing the implementation of this
specific
> > +      * type of configured stream.
> > +      */
> >       const struct i915_perf_stream_ops *ops;
> >  };
> >
> > +/**
> > + * struct i915_oa_ops - Gen specific implementation of an OA unit
stream
> > + */
> >  struct i915_oa_ops {
> > +     /**
> > +      * @init_oa_buffer: Resets the head and tail pointers of the
> > +      * circular buffer for periodic OA reports.
> > +      *
> > +      * Called when first opening a stream for OA metrics, but also
may be
> > +      * called in response to an OA buffer overflow or other error
> > +      * condition.
> > +      *
> > +      * Note it may be necessary to clear the full OA buffer here as
part of
> > +      * maintaining the invariable that new reports must be written to
> > +      * zeroed memory for us to be able to reliable detect if an
expected
> > +      * report has not yet landed in memory.  (At least on Haswell the
OA
> > +      * buffer tail pointer is not synchronized with reports being
visible
> > +      * to the CPU)
> > +      */
> >       void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
> > +
> > +     /**
> > +      * @enable_metric_set: Applies any MUX configuration to set up the
> > +      * Boolean and Custom (B/C) counters that are part of the counter
> > +      * reports being sampled. May apply system constraints such as
> > +      * disabling EU clock gating as required.
> > +      */
> >       int (*enable_metric_set)(struct drm_i915_private *dev_priv);
> > +
> > +     /**
> > +      * @disable_metric_set: Remove system constraints associated with
using
> > +      * the OA unit.
> > +      */
> >       void (*disable_metric_set)(struct drm_i915_private *dev_priv);
> > +
> > +     /**
> > +      * @oa_enable: Enable periodic sampling
> > +      */
> >       void (*oa_enable)(struct drm_i915_private *dev_priv);
> > +
> > +     /**
> > +      * @oa_disable: Disable periodic sampling
> > +      */
> >       void (*oa_disable)(struct drm_i915_private *dev_priv);
> > -     void (*update_oacontrol)(struct drm_i915_private *dev_priv);
> > -     void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
> > -                                     u32 ctx_id);
> > +
> > +     /**
> > +      * @read: Copy data from the circular OA buffer into a given
userspace
> > +      * buffer.
> > +      */
> >       int (*read)(struct i915_perf_stream *stream,
> >                   char __user *buf,
> >                   size_t count,
> >                   size_t *offset);
> > +
> > +     /**
> > +      * @oa_buffer_is_empty: Check if OA buffer empty (false positives
OK)
> > +      *
> > +      * This is either called via fops or the poll check hrtimer
(atomic
> > +      * ctx) without any locks taken.
> > +      *
> > +      * It's safe to read OA config state here unlocked, assuming that
this
> > +      * is only called while the stream is enabled, while the global OA
> > +      * configuration can't be modified.
> > +      *
> > +      * Efficiency is more important than avoiding some false positives
> > +      * here, which will be handled gracefully - likely resulting in an
> > +      * EAGAIN error for userspace.
> > +      */
> >       bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
> >  };
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
b/drivers/gpu/drm/i915/i915_perf.c
> > index 68b7c27..0be8cef 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -26,7 +26,10 @@
> >
> >
> >  /**
> > - * DOC: i915 Perf, streaming API for GPU metrics
> > + * DOC: i915 Perf Overview
> > + *
> > + * Overview
> > + * --------
> >   *
> >   * Gen graphics supports a large number of performance counters that
can help
> >   * driver and application developers understand and optimize their use
of the
> > @@ -45,6 +48,13 @@
> >   * privileges by default, unless changed via the
dev.i915.perf_event_paranoid
> >   * sysctl option.
> >   *
> > + */
> > +
> > +/**
> > + * DOC: i915 Perf History and Comparison with Core Perf
> > + *
> > + * Comparison with Core Perf
> > + * -------------------------
> >   *
> >   * The interface was initially inspired by the core Perf
infrastructure but
> >   * some notable differences are:
> > @@ -75,8 +85,8 @@
> >   * gets copied from the GPU mapped buffers to userspace buffers.
> >   *
> >   *
> > - * Some notes regarding Linux Perf:
> > - * --------------------------------
> > + * Issues hit with first prototype based on Core Perf
> > + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >   *
> >   * The first prototype of this driver was based on the core perf
> >   * infrastructure, and while we did make that mostly work, with some
changes to
> > @@ -135,7 +145,7 @@
> >   *   for combining with the side-band raw reports it captures using
> >   *   MI_REPORT_PERF_COUNT commands.
> >   *
> > - *   _ As a side note on perf's grouping feature; there was also some
concern
> > + *   - As a side note on perf's grouping feature; there was also some
concern
> >   *     that using PERF_FORMAT_GROUP as a way to pack together counter
values
> >   *     would quite drastically inflate our sample sizes, which would
likely
> >   *     lower the effective sampling resolutions we could use when the
available
> > @@ -277,6 +287,20 @@ static struct i915_oa_format
hsw_oa_formats[I915_OA_FORMAT_MAX] = {
> >
> >  #define SAMPLE_OA_REPORT      (1<<0)
> >
> > +/**
> > + * struct perf_open_properties - for validated properties given to
open a stream
> > + * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked
as flags
> > + * @single_context: Whether a single or all gpu contexts should be
monitored
> > + * @ctx_handle: A gem ctx handle for use with @single_context
> > + * @metrics_set: An ID for an OA unit metric set advertised via sysfs
> > + * @oa_format: An OA unit HW report format
> > + * @oa_periodic: Whether to enable periodic OA unit sampling
> > + * @oa_period_exponent: The OA unit sampling period is derived from
this
> > + *
> > + * As read_properties_unlocked() enumerates and validates the
properties given
> > + * to open a stream of metrics the configuration is built up in the
structure
> > + * which starts out zero initialized.
> > + */
> >  struct perf_open_properties {
> >       u32 sample_flags;
> >
> > @@ -314,7 +338,19 @@ static bool gen7_oa_buffer_is_empty_fop_unlocked(struct
drm_i915_private *dev_pr
> >  }
> >
> >  /**
> > - * Appends a status record to a userspace read() buffer.
> > + * append_oa_status - Appends a status record to a userspace read()
buffer.
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + * @type: The kind of status to report to userspace
> > + *
> > + * Writes a status record (such as `DRM_I915_PERF_RECORD_OA_
REPORT_LOST`)
> > + * into the userspace read() buffer.
> > + *
> > + * The @buf @offset will only be updated on success.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> >   */
> >  static int append_oa_status(struct i915_perf_stream *stream,
> >                           char __user *buf,
> > @@ -336,7 +372,21 @@ static int append_oa_status(struct
i915_perf_stream *stream,
> >  }
> >
> >  /**
> > - * Copies single OA report into userspace read() buffer.
> > + * append_oa_sample - Copies single OA report into userspace read()
buffer.
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + * @report: A single OA report to (optionally) include as part of the
sample
> > + *
> > + * The contents of a sample are configured through
`DRM_I915_PERF_PROP_SAMPLE_*`
> > + * properties when opening a stream, tracked as
`stream->sample_flags`. This
> > + * function copies the requested components of a single sample to the
given
> > + * read() @buf.
> > + *
> > + * The @buf @offset will only be updated on success.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> >   */
> >  static int append_oa_sample(struct i915_perf_stream *stream,
> >                           char __user *buf,
> > @@ -380,8 +430,6 @@ static int append_oa_sample(struct i915_perf_stream
*stream,
> >   * @head_ptr: (inout): the current oa buffer cpu read position
> >   * @tail: the current oa buffer gpu write position
> >   *
> > - * Returns 0 on success, negative error code on failure.
> > - *
> >   * Notably any error condition resulting in a short read (-ENOSPC or
> >   * -EFAULT) will be returned even though one or more records may
> >   * have been successfully copied. In this case it's up to the caller
> > @@ -392,6 +440,8 @@ static int append_oa_sample(struct i915_perf_stream
*stream,
> >   * tail, so the head chases the tail?... If you think that's mad
> >   * and back-to-front you're not alone, but this follows the
> >   * Gen PRM naming convention.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> >   */
> >  static int gen7_append_oa_reports(struct i915_perf_stream *stream,
> >                                 char __user *buf,
> > @@ -496,6 +546,22 @@ static int gen7_append_oa_reports(struct
i915_perf_stream *stream,
> >       return ret;
> >  }
> >
> > +/**
> > + * gen7_oa_read - copy status records then buffered OA reports
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + *
> > + * Checks Gen 7 specific OA unit status registers and if necessary
appends
> > + * corresponding status records for userspace (such as for a buffer
full
> > + * condition) and then initiate appending any buffered OA reports.
> > + *
> > + * Updates @offset according to the number of bytes successfully
copied into
> > + * the userspace buffer.
> > + *
> > + * Returns: zero on success or a negative error code
> > + */
> >  static int gen7_oa_read(struct i915_perf_stream *stream,
> >                       char __user *buf,
> >                       size_t count,
> > @@ -597,6 +663,20 @@ static int gen7_oa_read(struct i915_perf_stream
*stream,
> >       return ret;
> >  }
> >
> > +/**
> > + * i915_oa_wait_unlocked - handles blocking IO until OA data available
> > + * @stream: An i915-perf stream opened for OA metrics
> > + *
> > + * Called when userspace tries to read() from a blocking stream FD
opened
> > + * for OA metrics. It waits until the hrtimer callback finds a
non-empty
> > + * OA buffer and wakes us.
> > + *
> > + * Note: it's acceptable to have this return with some false positives
> > + * since any subsequent read handling will return EAGAIN if there isn't
> > + * really data ready for userspace yet.
> > + *
> > + * Returns: zero on success or a negative error code
> > + */
> >  static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> >  {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > @@ -615,6 +695,16 @@ static int i915_oa_wait_unlocked(struct
i915_perf_stream *stream)
> >                                       !dev_priv->perf.oa.ops.oa_
buffer_is_empty(dev_priv));
> >  }
> >
> > +/**
> > + * i915_oa_poll_wait - call poll_wait() for an OA stream poll()
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @file: An i915 perf stream file
> > + * @wait: poll() state table
> > + *
> > + * For handling userspace polling on an i915 perf stream opened for OA
metrics,
> > + * this starts a poll_wait with the wait queue that our hrtimer
callback wakes
> > + * when it sees data ready to read in the circular OA buffer.
> > + */
> >  static void i915_oa_poll_wait(struct i915_perf_stream *stream,
> >                             struct file *file,
> >                             poll_table *wait)
> > @@ -624,6 +714,18 @@ static void i915_oa_poll_wait(struct
i915_perf_stream *stream,
> >       poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
> >  }
> >
> > +/**
> > + * i915_oa_read - just calls through to ``&i915_oa_ops->read``
> > + * @stream: An i915-perf stream opened for OA metrics
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @offset: (inout): the current position for writing into @buf
> > + *
> > + * Updates @offset according to the number of bytes successfully
copied into
> > + * the userspace buffer.
> > + *
> > + * Returns: zero on success or a negative error code
> > + */
> >  static int i915_oa_read(struct i915_perf_stream *stream,
> >                       char __user *buf,
> >                       size_t count,
> > @@ -634,9 +736,15 @@ static int i915_oa_read(struct i915_perf_stream
*stream,
> >       return dev_priv->perf.oa.ops.read(stream, buf, count, offset);
> >  }
> >
> > -/* Determine the render context hw id, and ensure it remains fixed for
the
> > +/**
> > + * oa_get_render_ctx_id - determine and hold ctx hw id
> > + * @stream: An i915-perf stream opened for OA metrics
> > + *
> > + * Determine the render context hw id, and ensure it remains fixed for
the
> >   * lifetime of the stream. This ensures that we don't have to worry
about
> >   * updating the context ID in OACONTROL on the fly.
> > + *
> > + * Returns: zero on success or a negative error code
> >   */
> >  static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
> >  {
> > @@ -673,6 +781,13 @@ static int oa_get_render_ctx_id(struct
i915_perf_stream *stream)
> >       return ret;
> >  }
> >
> > +/**
> > + * oa_put_render_ctx_id - counterpart to oa_get_render_ctx_id releases
hold
> > + * @stream: An i915-perf stream opened for OA metrics
> > + *
> > + * In case anything needed doing to ensure the context HW ID would
remain valid
> > + * for the lifetime of the stream, then that can be undone here.
> > + */
> >  static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
> >  {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > @@ -945,6 +1060,15 @@ static void gen7_oa_enable(struct
drm_i915_private *dev_priv)
> >       spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags);
> >  }
> >
> > +/**
> > + * i915_oa_stream_enable - handle I915_PERF_IOCTL_ENABLE for OA stream
> > + * @stream: An i915 perf stream opened for OA metrics
> > + *
> > + * [Re]enables hardware periodic sampling according to the period
configured
> > + * when opening the stream. This also starts a hrtimer that will
periodically
> > + * check for data in the circular OA buffer for notifying userspace
(e.g.
> > + * during a read() or poll()).
> > + */
> >  static void i915_oa_stream_enable(struct i915_perf_stream *stream)
> >  {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > @@ -962,6 +1086,14 @@ static void gen7_oa_disable(struct
drm_i915_private *dev_priv)
> >       I915_WRITE(GEN7_OACONTROL, 0);
> >  }
> >
> > +/**
> > + * i915_oa_stream_disable - handle I915_PERF_IOCTL_DISABLE for OA
stream
> > + * @stream: An i915 perf stream opened for OA metrics
> > + *
> > + * Stops the OA unit from periodically writing counter reports into the
> > + * circular OA buffer. This also stops the hrtimer that periodically
checks for
> > + * data in the circular OA buffer, for notifying userspace.
> > + */
> >  static void i915_oa_stream_disable(struct i915_perf_stream *stream)
> >  {
> >       struct drm_i915_private *dev_priv = stream->dev_priv;
> > @@ -987,6 +1119,24 @@ static const struct i915_perf_stream_ops
i915_oa_stream_ops = {
> >       .read = i915_oa_read,
> >  };
> >
> > +/**
> > + * i915_oa_stream_init - validate combined props for OA stream and init
> > + * @stream: An i915 perf stream
> > + * @param: The open parameters passed to 'DRM_I915_PERF_OPEN`
> > + * @props: The property state that configures stream (individually
validated)
> > + *
> > + * While read_properties_unlocked() validates properties in isolation
it
> > + * doesn't ensure that the combination necessarily makes sense.
> > + *
> > + * At this point it has been determined that userspace wants a stream
of
> > + * OA metrics, but still we need to further validate the combined
> > + * properties are OK.
> > + *
> > + * If the configuration makes sense then we can allocate memory for
> > + * a circular OA buffer and apply the requested metric set
configuration.
> > + *
> > + * Returns: zero on success or a negative error code.
> > + */
> >  static int i915_oa_stream_init(struct i915_perf_stream *stream,
> >                              struct drm_i915_perf_open_param *param,
> >                              struct perf_open_properties *props)
> > @@ -1110,6 +1260,31 @@ static int i915_oa_stream_init(struct
i915_perf_stream *stream,
> >       return ret;
> >  }
> >
> > +/**
> > + * i915_perf_read_locked - wrap stream->ops->read with error
notmalisation
> > + * @stream: An i915 perf stream
> > + * @file: An i915 perf stream file
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @ppos: (inout) file seek position (unused)
> > + *
> > + * Besides wrapping stream->ops->read() this provides a common place
to ensure
> > + * that if we've successfully copied any data then reporting that takes
> > + * precedence over any internal error status, so the data isn't lost.
> > + *
> > + * For example ret will be -ENOSPC whenever there is more buffered
data than
> > + * can be copied to userspace, but that's only interesting if we
weren't able
> > + * to copy some data because it implies the userspace buffer is too
small to
> > + * receive a single record (and we never split records).
> > + *
> > + * Another case with ret == -EFAULT is more of a grey area since it
would seem
> > + * like bad form for userspace to ask us to overrun its buffer, but
the user
> > + * knows best:
> > + *
> > + *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > + *
> > + * Returns: The number of bytes copied or a negative error code on
failure.
> > + */
> >  static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> >                                    struct file *file,
> >                                    char __user *buf,
> > @@ -1125,25 +1300,27 @@ static ssize_t i915_perf_read_locked(struct
i915_perf_stream *stream,
> >       size_t offset = 0;
> >       int ret = stream->ops->read(stream, buf, count, &offset);
> >
> > -     /* If we've successfully copied any data then reporting that
> > -      * takes precedence over any internal error status, so the
> > -      * data isn't lost.
> > -      *
> > -      * For example ret will be -ENOSPC whenever there is more
> > -      * buffered data than can be copied to userspace, but that's
> > -      * only interesting if we weren't able to copy some data
> > -      * because it implies the userspace buffer is too small to
> > -      * receive a single record (and we never split records).
> > -      *
> > -      * Another case with ret == -EFAULT is more of a grey area
> > -      * since it would seem like bad form for userspace to ask us
> > -      * to overrun its buffer, but the user knows best:
> > -      *
> > -      *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > -      */
> >       return offset ?: (ret ?: -EAGAIN);
> >  }
> >
> > +/**
> > + * i915_perf_read - handles read() FOP for i915 perf stream FDs
> > + * @file: An i915 perf stream file
> > + * @buf: destination buffer given by userspace
> > + * @count: the number of bytes userspace wants to read
> > + * @ppos: (inout) file seek position (unused)
> > + *
> > + * The entry point for handling a read() on a stream file descriptor
from
> > + * userspace. Most of the work is left to the i915_perf_read_locked()
and
> > + * stream->op->read() but to save having stream implementations (of
which
> > + * we might have multiple later) we handle blocking read here.
> > + *
> > + * We can also consistently treat trying to read from a disable stream
> > + * as an IO error so implementations can assume the stream is enabled
> > + * while reading.
> > + *
> > + * Returns: The number of bytes copied or a negative error code on
failure.
> > + */
> >  static ssize_t i915_perf_read(struct file *file,
> >                             char __user *buf,
> >                             size_t count,
> > @@ -1458,12 +1635,20 @@ i915_perf_open_ioctl_locked(struct
drm_i915_private *dev_priv,
> >       return ret;
> >  }
> >
> > -/* Note we copy the properties from userspace outside of the i915 perf
> > - * mutex to avoid an awkward lockdep with mmap_sem.
> > +/**
> > + * read_properties_unlocked - validate + copy userspace stream open
properties
> > + * @dev_priv: i915 device instance
> > + * @uprops: The array of u64 key value pairs given by userspace
> > + * @n_props: The number of key value pairs expected in @uprops
> > + * @props: The stream configuration built up while validating
properties
> >   *
> >   * Note this function only validates properties in isolation it doesn't
> >   * validate that the combination of properties makes sense or that all
> >   * properties necessary for a particular kind of stream have been set.
> > + *
> > + * Note that there currently aren't any ordering requirements for
properties so
> > + * we shouldn't validate or assume anything about ordering here. This
doesn't
> > + * rule out defining new properties with ordering requirements in the
future.
> >   */
> >  static int read_properties_unlocked(struct drm_i915_private *dev_priv,
> >                                   u64 __user *uprops,
> > @@ -1585,6 +1770,26 @@ static int read_properties_unlocked(struct
drm_i915_private *dev_priv,
> >       return 0;
> >  }
> >
> > +/**
> > + * i915_perf_open_ioctl - DRM ioctl() for userspace to open a stream FD
> > + * @dev: drm device
> > + * @data: ioctl data copied from userspace (unvalidated)
> > + * @file: drm file
> > + *
> > + * Validates the stream open parameters given by userspace including
flags
> > + * and an array of u64 key, value pair properties.
> > + *
> > + * Very little is assumed up front about the nature of the stream being
> > + * opened (for instance we don't assume it's for periodic OA unit
metrics). An
> > + * i915-perf stream is expected to be a suitable interface for other
forms of
> > + * buffered data written by the GPU besides periodic OA metrics.
> > + *
> > + * Note we copy the properties from userspace outside of the i915 perf
> > + * mutex to avoid an awkward lockdep with mmap_sem.
> > + *
> > + * Return: A newly opened i915 Perf stream file descriptor or negative
> > + * error code on failure.
> > + */
> >  int i915_perf_open_ioctl(struct drm_device *dev, void *data,
> >                        struct drm_file *file)
> >  {
> > @@ -1621,6 +1826,14 @@ int i915_perf_open_ioctl(struct drm_device *dev,
void *data,
> >       return ret;
> >  }
> >
> > +/**
> > + * i915_perf_register - exposes i915-perf to userspace
> > + * @dev_priv: i915 device instance
> > + *
> > + * In particular OA metric sets are advertised under a sysfs metrics/
> > + * directory allowing userspace to enumerate valid IDs that can be
> > + * used to open an i915-perf stream.
> > + */
> >  void i915_perf_register(struct drm_i915_private *dev_priv)
> >  {
> >       if (!IS_HASWELL(dev_priv))
> > @@ -1650,6 +1863,15 @@ void i915_perf_register(struct drm_i915_private
*dev_priv)
> >       mutex_unlock(&dev_priv->perf.lock);
> >  }
> >
> > +/**
> > + * i915_perf_register - hide i915-perf from userspace
> > + * @dev_priv: i915 device instance
> > + *
> > + * i915-perf state cleanup is split up into an 'unregister' and
> > + * 'deinit' phase where the interface is first hidden from
> > + * userspace by i915_perf_unregister() before cleaning up
> > + * remaining state in i915_perf_fini().
> > + */
> >  void i915_perf_unregister(struct drm_i915_private *dev_priv)
> >  {
> >       if (!IS_HASWELL(dev_priv))
> > @@ -1706,6 +1928,15 @@ static struct ctl_table dev_root[] = {
> >       {}
> >  };
> >
> > +/**
> > + * i915_perf_init - initialize i915-perf state on module load
> > + * @dev_priv: i915 device instance
> > + *
> > + * Initializes state i915-perf state without exposing anything to
userspace.
> > + *
> > + * Note: i915-perf initialization is split into an 'init' and
'register'
> > + * phase with the i915_perf_register() exposing state to userspace.
> > + */
> >  void i915_perf_init(struct drm_i915_private *dev_priv)
> >  {
> >       if (!IS_HASWELL(dev_priv))
> > @@ -1741,6 +1972,10 @@ void i915_perf_init(struct drm_i915_private
*dev_priv)
> >       dev_priv->perf.initialized = true;
> >  }
> >
> > +/**
> > + * i915_perf_fini - Counter part to i915_perf_init()
> > + * @dev_priv: i915 device instance
> > + */
> >  void i915_perf_fini(struct drm_i915_private *dev_priv)
> >  {
> >       if (!dev_priv->perf.initialized)
> > --
> > 2.10.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20161202/ee0cbd28/attachment-0001.html>


More information about the Intel-gfx mailing list