[Intel-gfx] [RFC 1/2] drm/i915: Improve record of hung engines in error state
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Nov 4 13:03:56 UTC 2020
On 04/11/2020 12:30, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-04 12:20:42)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Between events which trigger engine and GPU resets and capturing the error
>> state we lose information on which engine triggered the reset. Improve
>> this by passing in the hung engine mask down to error capture.
>>
>> Result is that the list of engines in user visible "GPU HANG: ecode
>> <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
>> active engines. Most importantly the displayed process is now the one
>> which was actually hung.
>
> You could also suggest to only include the hanging engine in the report,
> as is intended to be the normal means of generating the report
I thought it is potentially useful to have a full picture, but can do
that as well.
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>> index 0220b0992808..3a7ca90a3436 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>> @@ -59,6 +59,7 @@ struct i915_request_coredump {
>> struct intel_engine_coredump {
>> const struct intel_engine_cs *engine;
>>
>> + bool hung;
>> bool simulated;
>> u32 reset_count;
>>
>> @@ -218,8 +219,10 @@ struct drm_i915_error_state_buf {
>> __printf(2, 3)
>> void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
>>
>> -struct i915_gpu_coredump *i915_gpu_coredump(struct drm_i915_private *i915);
>> -void i915_capture_error_state(struct drm_i915_private *i915);
>> +struct i915_gpu_coredump *i915_gpu_coredump(struct intel_gt *gt,
>> + intel_engine_mask_t engine_mask);
>> +void i915_capture_error_state(struct intel_gt *gt,
>> + intel_engine_mask_t engine_mask);
>
> Don't forget the stubs.
Right, thanks.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list