[Intel-gfx] [PATCH 06/11] drm/i915: Framework for capturing command stream based OA reports
sourab gupta
sourab.gupta at intel.com
Fri Feb 19 06:51:13 UTC 2016
On Wed, 2016-02-17 at 23:00 +0530, Robert Bragg wrote:
> Hi Sourab,
>
>
> As Sergio Martinez has started experimenting with this in gputop and
> reported seeing lots of ENOSPC errors being reported when reading I
> had a look into this and saw a few issues with how we check that
> there's data available to read in command stream mode, and a I think
> there's a possibility of incorrectly sorting the samples sometimes...
Hi Robert,
Thanks for spotting this anomaly. I'll have this fixed in the next
version of patch set.
>
> On Tue, Feb 16, 2016 at 5:27 AM, <sourab.gupta at intel.com> wrote:
> From: Sourab Gupta <sourab.gupta at intel.com>
>
>
> -static bool i915_oa_can_read(struct i915_perf_stream *stream)
> +static bool append_oa_rcs_sample(struct i915_perf_stream
> *stream,
> + struct i915_perf_read_state
> *read_state,
> + struct i915_perf_cs_data_node
> *node)
> +{
> + struct drm_i915_private *dev_priv = stream->dev_priv;
> + struct oa_sample_data data = { 0 };
> + const u8 *report =
> dev_priv->perf.command_stream_buf.addr +
> + node->offset;
> + u32 sample_flags = stream->sample_flags;
> + u32 report_ts;
> +
> + /*
> + * Forward the periodic OA samples which have the
> timestamp lower
> + * than timestamp of this sample, before forwarding
> this sample.
> + * This ensures samples read by user are order acc. to
> their timestamps
> + */
> + report_ts = *(u32 *)(report + 4);
> + dev_priv->perf.oa.ops.read(stream, read_state,
> report_ts);
> +
> + if (sample_flags & SAMPLE_OA_SOURCE_INFO)
> + data.source = I915_PERF_OA_EVENT_SOURCE_RCS;
> +
> + if (sample_flags & SAMPLE_CTX_ID)
> + data.ctx_id = node->ctx_id;
> +
> + if (sample_flags & SAMPLE_OA_REPORT)
> + data.report = report;
> +
> + append_oa_sample(stream, read_state, &data);
> +
> + return true;
> +}
> +
> +static void oa_rcs_append_reports(struct i915_perf_stream
> *stream,
> + struct i915_perf_read_state
> *read_state)
> +{
> + struct drm_i915_private *dev_priv = stream->dev_priv;
> + struct i915_perf_cs_data_node *entry, *next;
> +
> + list_for_each_entry_safe(entry, next,
> + &dev_priv->perf.node_list,
> link) {
> + if (!
> i915_gem_request_completed(entry->request, true))
> + break;
> +
> + if (!append_oa_rcs_sample(stream, read_state,
> entry))
> + break;
> +
> + spin_lock(&dev_priv->perf.node_list_lock);
> + list_del(&entry->link);
> + spin_unlock(&dev_priv->perf.node_list_lock);
> +
> +
> i915_gem_request_unreference__unlocked(entry->request);
> + kfree(entry);
> + }
> +
> + /* Flush any remaining periodic reports */
> + dev_priv->perf.oa.ops.read(stream, read_state,
> U32_MAX);
>
> I don't think we can flush all remaining periodic reports here - at
> least not if we have any in-flight MI_RPC commands - in case the next
> request to complete might have reports with earlier timestamps than
> some of these periodic reports.
>
>
> Even if we have periodic reports available I think we need to throttle
> forwarding them based on the command stream requests completing.
>
>
> This is something that userspace should understand when it explicitly
> decides to use command stream mode in conjunction with periodic
> sampling.
>
I agree, there shouldn't be any flushing of remaining periodic reports
here, instead any periodic reports remaining here should be taken care
of during the next processing of command stream samples.
>
> +}
> +
> +static bool command_stream_buf_is_empty(struct
> i915_perf_stream *stream)
> {
> struct drm_i915_private *dev_priv = stream->dev_priv;
>
> - return !
> dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv);
> + if (stream->cs_mode)
> + return list_empty(&dev_priv->perf.node_list);
> + else
> + return true;
> }
>
>
> I think this list_empty() check needs a lock around it, as it's called
> from stream_have_data__unlocked().
>
>
> Currently only checking list_empty() can lead to some false positives
> that may confuse the current i915_perf_read() (the false positives
> come from not checking that at least one entry->request has also
> completed).
>
> False positives here will affect the error reporting in
> i915_perf_read_locked(). The first thing i915_perf_read_locked() does
> is check for available data so it can return -EAGAIN for non-blocking
> file descriptors or block, waiting for data, for blocking fds. Once we
> believe there is data then if stream->read() returns zero then that's
> translated to a -ENOSPC error because we 'know' there's data but
> weren't able to copy anything to the user's buffer because it wasn't
> large enough to hold a complete record.
>
Thanks for pointing this out. This function should check that at least
one request has completed when the list is not empty. I'll aim to make
this change.
>
>
> -static int i915_oa_wait_unlocked(struct i915_perf_stream
> *stream)
> +static bool stream_have_data__unlocked(struct
> i915_perf_stream *stream)
> {
> struct drm_i915_private *dev_priv = stream->dev_priv;
>
> @@ -397,8 +739,29 @@ static int i915_oa_wait_unlocked(struct
> i915_perf_stream *stream)
> * can't be destroyed until completion (such as a
> read()) that ensures
> * the device + OA buffer can't disappear
> */
> +
> return !(dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)
> &&
> + command_stream_buf_is_empty(stream));
> +}
>
>
> In command stream mode, due to how we need to merge sort reports I
> think we need a more involved set of checks. We need to check that at
> least one entry->request in the node_list has completed, or else if
> the node_list is empty then we can check if !oa_buffer_is_empty().
>
>
> There's going to be a thorny race to consider though: if this code
> ends up reporting that there is data to read because the node_list is
> empty and there are periodic samples, but by the time we call into
> stream->read() there's a new node_list entry that's not completed that
> will end up returning 0 which could again result in an incorrect
> ENOSPC error.
>
> We should consider changing stream->read() so it can return -ENOSPC
> itself if appropriate and when it returns 0 we would instead either
> return -EAGAIN or loop back to waiting for blocking fds.
Yeah, this race needs to be handled, ideally best way would be through
the changes to stream->read() itself as you suggested. I'll aim to make
this change (while rebasing on your recent copy_to_user changes).
> B.t.w for reference the stream->read() interface is a little different
> in the series I sent out recently for Haswell vs the gen8+ oa-next
> branch (which I haven't rebased yet) because I had to make some last
> minute changes to consider -EFAULT errors if there was a
> copy_to_user() error while reading. If you look at this issue, it
> could be good to also compare with these related changes I made
> recently.
>
> +
> +static bool i915_oa_can_read(struct i915_perf_stream *stream)
> +{
> +
> + return stream_have_data__unlocked(stream);
> +}
>
>
>
> Kind regards,
>
> - Robert
>
More information about the Intel-gfx
mailing list