[Intel-gfx] [PATCH v5 08/10] drm/i915/perf: allow holding preemption on filtered ctx
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Thu Jun 27 14:27:40 UTC 2019
On 27/06/2019 12:53, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-06-27 09:00:43)
>> We would like to make use of perf in Vulkan. The Vulkan API is much
>> lower level than OpenGL, with applications directly exposed to the
>> concept of command buffers (pretty much equivalent to our batch
>> buffers). In Vulkan, queries are always limited in scope to a command
>> buffer. In OpenGL, the lack of command buffer concept meant that
>> queries' duration could span multiple command buffers.
>>
>> With that restriction gone in Vulkan, we would like to simplify
>> measuring performance just by measuring the deltas between the counter
>> snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
>> more complex scheme we currently have in the GL driver, using 2
>> MI_RECORD_PERF_COUNT commands and doing some post processing on the
>> stream of OA reports, coming from the global OA buffer, to remove any
>> unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.
>>
>> Disabling preemption only apply to a single context with which want to
>> query performance counters for and is considered a privileged
>> operation, by default protected by CAP_SYS_ADMIN. It is possible to
>> enable it for a normal user by disabling the paranoid stream setting.
>>
>> v2: Store preemption setting in intel_context (Chris)
>>
>> v3: Use priorities to avoid preemption rather than the HW mechanism
>>
>> v4: Just modify the port priority reporting function
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_lrc.c | 7 ++++++-
>> drivers/gpu/drm/i915/i915_drv.c | 2 +-
>> drivers/gpu/drm/i915/i915_drv.h | 13 ++++++++++++
>> drivers/gpu/drm/i915/i915_perf.c | 22 +++++++++++++++++++--
>> drivers/gpu/drm/i915/i915_priolist_types.h | 7 +++++++
>> drivers/gpu/drm/i915/i915_request.c | 1 +
>> drivers/gpu/drm/i915/i915_request.h | 1 +
>> drivers/gpu/drm/i915/intel_guc_submission.c | 12 ++++++++++-
>> include/uapi/drm/i915_drm.h | 11 +++++++++++
>> 9 files changed, 71 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index a6ea468986d1..52f4d69cdb2f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -256,7 +256,12 @@ static inline int rq_prio(const struct i915_request *rq)
>>
>> static int effective_prio(const struct i915_request *rq)
>> {
>> - int prio = rq_prio(rq);
>> + int prio;
>> +
>> + if (rq->has_perf)
>> + prio = I915_USER_PRIORITY(I915_PRIORITY_PERF);
>> + else
>> + prio = rq_prio(rq);
> This doesn't cover everything. You will get timesliced now.
>
> But I do think this is the safest approach that evades all the PI sanity
> checks and possible deadlocks.
>
> There is also the problem where we check the priority on the last
> request in the queue, but preemption may need to be disabled at the head
> of the queue (the executing request). Is that a concern? I was thinking
> of marking the context as non-preemptible until after the oa is removed.
What do you mean by timesliced?
I don't really have a good way of working around the global counters so
whatever you think is best to allow queries in batch buffers to complete.
>
>> /*
>> * On unwinding the active request, we give it a priority bump
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index ef54cf1c54a5..68e29b1ed1c8 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -484,7 +484,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>> value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
>> break;
>> case I915_PARAM_PERF_REVISION:
>> - value = 1;
>> + value = 2;
>> break;
>> case I915_PARAM_HAS_EXEC_PERF_CONFIG:
>> /* Obviously requires perf support. */
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 51b103493623..f1b8a0ada986 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1232,6 +1232,19 @@ struct i915_perf_stream {
>> */
>> bool enabled;
>>
>> + /**
>> + * @hold_preemption: Whether preemption is put on hold for command
>> + * submissions done on the @ctx. This is useful for some drivers that
>> + * cannot easily post process the OA buffer context to subtract delta
>> + * of performance counters not associated with @ctx.
>> + */
>> + bool hold_preemption;
>> +
>> + /**
>> + * @last_config_request
>> + */
>> + struct i915_request *last_config_request;
>> +
>> /**
>> * @ops: The callbacks providing the implementation of this specific
>> * type of configured stream.
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 34d74fa64c74..bf4f5fee6764 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -343,6 +343,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = {
>> * struct perf_open_properties - for validated properties given to open a stream
>> * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags
>> * @single_context: Whether a single or all gpu contexts should be monitored
>> + * @hold_preemption: Whether the preemption is disabled for the filtered
>> + * context
>> * @ctx_handle: A gem ctx handle for use with @single_context
>> * @metrics_set: An ID for an OA unit metric set advertised via sysfs
>> * @oa_format: An OA unit HW report format
>> @@ -357,6 +359,7 @@ struct perf_open_properties {
>> u32 sample_flags;
>>
>> u64 single_context:1;
>> + u64 hold_preemption:1;
>> u64 ctx_handle;
>>
>> /* OA sampling state */
>> @@ -2352,6 +2355,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>> stream->sample_flags |= SAMPLE_OA_REPORT;
>> stream->sample_size += format_size;
>>
>> + stream->hold_preemption = props->hold_preemption;
>> +
>> dev_priv->perf.oa.oa_buffer.format_size = format_size;
>> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>> return -EINVAL;
>> @@ -2890,6 +2895,15 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
>> }
>> }
>>
>> + if (props->hold_preemption) {
>> + if (!props->single_context) {
>> + DRM_DEBUG("preemption disable with no context\n");
>> + ret = -EINVAL;
>> + goto err;
>> + }
>> + privileged_op = true;
>> + }
>> +
>> /*
>> * On Haswell the OA unit supports clock gating off for a specific
>> * context and in this mode there's no visibility of metrics for the
>> @@ -2904,8 +2918,9 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
>> * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to
>> * enable the OA unit by default.
>> */
>> - if (IS_HASWELL(dev_priv) && specific_ctx)
>> + if (IS_HASWELL(dev_priv) && specific_ctx && !props->hold_preemption) {
>> privileged_op = false;
>> + }
>>
>> /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
>> * we check a dev.i915.perf_stream_paranoid sysctl option
>> @@ -2914,7 +2929,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
>> */
>> if (privileged_op &&
>> i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
>> - DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
>> + DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
>> ret = -EACCES;
>> goto err_ctx;
>> }
>> @@ -3106,6 +3121,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
>> props->oa_periodic = true;
>> props->oa_period_exponent = value;
>> break;
>> + case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
>> + props->hold_preemption = value != 0 ? 1 : 0;
> !!value.
Done.
>
>> + break;
>> case DRM_I915_PERF_PROP_MAX:
>> MISSING_CASE(id);
>> return -EINVAL;
>> diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
>> index 49709de69875..fc1c1fe7ce58 100644
>> --- a/drivers/gpu/drm/i915/i915_priolist_types.h
>> +++ b/drivers/gpu/drm/i915/i915_priolist_types.h
>> @@ -17,6 +17,13 @@ enum {
>> I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
>> I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
>>
>> + /* Requests containing performance queries must not be preempted by
>> + * another context. They get scheduled with their default priority and
>> + * once they reach the execlist ports we bump them to
>> + * I915_PRIORITY_PERF so that they stick to the HW until they finish.
>> + */
>> + I915_PRIORITY_PERF = I915_CONTEXT_MAX_USER_PRIORITY + 2,
> Just let it be next, one above max?
Done.
>
>> +
>> I915_PRIORITY_INVALID = INT_MIN
>> };
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> index 5ff87c4a0cd5..44b4b2bfedf9 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -685,6 +685,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>> rq->batch = NULL;
>> rq->capture_list = NULL;
>> rq->waitboost = false;
>> + rq->has_perf = false;
>> rq->execution_mask = ALL_ENGINES;
>>
>> INIT_LIST_HEAD(&rq->active_list);
>> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
>> index b58ceef92e20..21be6b44602d 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -217,6 +217,7 @@ struct i915_request {
>> unsigned long emitted_jiffies;
>>
>> bool waitboost;
>> + bool has_perf;
> 2 bools, probably time for a flags if we stick with this approach.
>
> Also note previous patch didn't compile but rq->perf was out of
> sequence.
> -Chris
>
Done.
More information about the Intel-gfx
mailing list