[Intel-gfx] [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Feb 25 13:34:50 UTC 2019


On 21/02/2019 02:58, Carlos Santa wrote:
> From: Michel Thierry <michel.thierry at intel.com>
> 
> Users/tests relying on the total reset count will start seeing a smaller
> number since most of the hangs can be handled by engine reset.
> Note that if reset engine x, context a running on engine y will be unaware
> and unaffected.
> 
> To start the discussion, include just a total engine reset count. If it
> is deemed useful, it can be extended to report each engine separately.
> 
> Our igt's gem_reset_stats test will need changes to ignore the pad field,
> since it can now return reset_engine_count.
> 
> v2: s/engine_reset/reset_engine/, use union in uapi to not break compatibility.
> v3: Keep rejecting attempts to use pad as input (Antonio)
> v4: Rebased.
> v5: Rebased.
> 
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
> Signed-off-by: Carlos Santa <carlos.santa at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++--
>   include/uapi/drm/i915_drm.h             |  6 +++++-
>   2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 459f8eae1c39..cbfe8f2eb3f2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1889,6 +1889,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>   	struct drm_i915_private *dev_priv = to_i915(dev);
>   	struct drm_i915_reset_stats *args = data;
>   	struct i915_gem_context *ctx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	int ret;
>   
>   	if (args->flags || args->pad)
> @@ -1907,10 +1909,16 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>   	 * we should wrap the hangstats with a seqlock.
>   	 */
>   
> -	if (capable(CAP_SYS_ADMIN))
> +	if (capable(CAP_SYS_ADMIN)) {
>   		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> -	else
> +		for_each_engine(engine, dev_priv, id)
> +			args->reset_engine_count +=
> +				i915_reset_engine_count(&dev_priv->gpu_error,
> +							engine);

If access to global GPU reset count is privileged, why is access to 
global engine reset count not? It seems to be fundamentally same level 
of data leakage.

If we wanted to provide some numbers to unprivileged users I think we 
would need to store some counters per file_priv/context and return those 
when !CAP_SYS_ADMIN.

> +	} else {
>   		args->reset_count = 0;
> +		args->reset_engine_count = 0;
> +	}
>   
>   	args->batch_active = atomic_read(&ctx->guilty_count);
>   	args->batch_pending = atomic_read(&ctx->active_count);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index cc03ef9f885f..3f2c89740b0e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1642,7 +1642,11 @@ struct drm_i915_reset_stats {
>   	/* Number of batches lost pending for execution, for this context */
>   	__u32 batch_pending;
>   
> -	__u32 pad;
> +	union {
> +		__u32 pad;
> +		/* Engine resets since boot/module reload, for all contexts */
> +		__u32 reset_engine_count;
> +	};

Chris pointed out in some other review that anonymous unions are not 
friendly towards C++ compilers.

Not sure what is the best option here. Renaming the field could break 
old userspace building against newer headers. Is that acceptable?

>   };
>   
>   struct drm_i915_gem_userptr {
> 

Regards,

Tvrtko


More information about the Intel-gfx mailing list