[Intel-gfx] [PATCH resend v2 3/8] drm/i915: Cope with request list state change during error state capture
Chris Wilson
chris at chris-wilson.co.uk
Mon Oct 19 09:06:27 PDT 2015
On Mon, Oct 19, 2015 at 03:55:48PM +0100, Tomas Elf wrote:
> Since we're not synchronizing the ring request list during error state capture
> the request list state might change between the time the corresponding error
> request list was allocated and dimensioned to the time when the ring request
> list is actually captured into the error state. If this happens then do an
> early exit and be aware that the captured error state might not be fully
> reliable.
>
> * v2:
> - Chris Wilson: Removed WARN_ON from size check since having the error state
> request list and the live driver request list diverge like this is a
> legitimate behaviour.
>
> - Tomas Elf: Removed update of num_request field since this made no sense. Just
> exit and move on.
>
> * Resend:
> - Responded to the wrong mailthread
>
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2f04e4f..b08a76b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1071,6 +1071,18 @@ static void i915_gem_record_rings(struct drm_device *dev,
> list_for_each_entry(request, &ring->request_list, list) {
> struct drm_i915_error_request *erq;
>
> + if (count >= error->ring[i].num_requests) {
> + /*
> + * If the ring request list was changed in
> + * between the point where the error request
> + * list was created and dimensioned and this
> + * point then just exit early to avoid crashes.
> + */
> + DRM_ERROR("Request list changed size since allocation (%u->%u)\n",
> + error->ring[i].num_requests, count);
The error message simply isn't that interesting. That requests were
added after the gpu hang occurred doesn't affect post-mortem debugging
of the hang, and if it were at all interesting, that information should
be stored in the error state itself.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list