[Intel-gfx] [PATCH v5 17/18] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout

Michel Thierry michel.thierry at intel.com
Fri Apr 14 16:47:43 UTC 2017



On 14/04/17 09:05, Daniele Ceraolo Spurio wrote:
>
>
> On 24/03/17 18:30, Michel Thierry wrote:
>> Final enablement patch for GPU hang detection using watchdog timeout.
>> Using the gem_context_setparam ioctl, users can specify the desired
>> timeout value in microseconds, and the driver will do the conversion to
>> 'timestamps'.
>>
>> The recommended default watchdog threshold for video engines is 60000 us,
>> since this has been _empirically determined_ to be a good compromise for
>> low-latency requirements and low rate of false positives. The default
>> register value is ~106000us and the theoretical max value (all 1s) is
>> 353 seconds.
>>
>> v2: Fixed get api to return values in microseconds. Threshold updated to
>> be per context engine. Check for u32 overflow. Capture ctx threshold
>> value in error state.
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>>  drivers/gpu/drm/i915/i915_gem_context.c | 78
>> +++++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/i915/i915_gem_context.h | 20 +++++++++
>>  drivers/gpu/drm/i915/i915_gpu_error.c   | 11 +++--
>>  drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
>>  include/uapi/drm/i915_drm.h             |  1 +
>>  6 files changed, 108 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index b43c37a911bb..1741584d858f 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1039,6 +1039,7 @@ struct i915_gpu_state {
>>              int ban_score;
>>              int active;
>>              int guilty;
>> +            int watchdog_threshold;
>>          } context;
>>
>>          struct drm_i915_error_object {
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index edbed85a1c88..f5c126c0c681 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -422,6 +422,78 @@ i915_gem_context_create_gvt(struct drm_device *dev)
>>      return ctx;
>>  }
>>
>> +/* Return the timer count threshold in microseconds. */
>> +int i915_gem_context_get_watchdog(struct i915_gem_context *ctx,
>> +                  struct drm_i915_gem_context_param *args)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct intel_engine_cs *engine;
>> +    enum intel_engine_id id;
>> +    u32 threshold_in_us[I915_NUM_ENGINES];
>> +
>> +    if (!dev_priv->engine[VCS]->emit_start_watchdog)
>> +        return -ENODEV;
>> +
>> +    for_each_engine(engine, dev_priv, id) {
>> +        struct intel_context *ce = &ctx->engine[id];
>> +
>> +        threshold_in_us[id] = watchdog_to_us(ce->watchdog_threshold);
>> +    }
>> +
>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>> +    if (__copy_to_user(u64_to_user_ptr(args->value),
>> +               &threshold_in_us,
>> +               sizeof(threshold_in_us))) {
>> +        mutex_lock(&dev_priv->drm.struct_mutex);
>> +        return -EFAULT;
>> +    }
>> +    mutex_lock(&dev_priv->drm.struct_mutex);
>> +    args->size = sizeof(threshold_in_us);
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Based on time out value in microseconds (us) calculate
>> + * timer count thresholds needed based on core frequency.
>> + * Watchdog can be disabled by setting it to 0.
>> + */
>> +int i915_gem_context_set_watchdog(struct i915_gem_context *ctx,
>> +                  struct drm_i915_gem_context_param *args)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct intel_engine_cs *engine;
>> +    enum intel_engine_id id;
>> +    u32 threshold_in_us[I915_NUM_ENGINES];
>> +
>> +    if (!dev_priv->engine[VCS]->emit_start_watchdog)
>> +        return -ENODEV;
>> +    else if (args->size < sizeof(threshold_in_us))
>> +        return -EINVAL;
>
> won't we break userspace with this check if we ever get more engines on
> a new platform and thus bump I915_NUM_ENGINES?
>
> Thanks,
> Daniele
>

There's a v3 of this patch with Chris feedback,
https://patchwork.freedesktop.org/patch/148805/


More information about the Intel-gfx mailing list