[Intel-gfx] [PATCH v3 7/8] drm/i915: Cope with request list state change during error state capture
Daniel Vetter
daniel at ffwll.ch
Thu Oct 22 03:53:33 PDT 2015
On Mon, Oct 19, 2015 at 05:51:57PM +0100, Tomas Elf wrote:
> Since we're not synchronizing the ring request list during error state capture
> the request list state might change between the time the corresponding error
> request list was allocated and dimensioned to the time when the ring request
> list is actually captured into the error state. If this happens then do an
> early exit and be aware that the captured error state might not be fully
> reliable.
>
> * v2:
> - Chris Wilson: Removed WARN_ON from size check since having the error state
> request list and the live driver request list diverge like this is a
> legitimate behaviour.
>
> - Tomas Elf: Removed update of num_request field since this made no sense. Just
> exit and move on.
>
> * v3:
> - Chris Wilson: Removed error message at the point of early exit. The user is
> not interested in any state changes happening during the error state capture,
> only in the state that we're trying to capture at the point of the error.
>
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
Queued for -next, thanks for the patch.
-Daniel
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2f04e4f..f3dc67b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1071,6 +1071,25 @@ static void i915_gem_record_rings(struct drm_device *dev,
> list_for_each_entry(request, &ring->request_list, list) {
> struct drm_i915_error_request *erq;
>
> + if (count >= error->ring[i].num_requests) {
> + /*
> + * If the ring request list was changed in
> + * between the point where the error request
> + * list was created and dimensioned and this
> + * point then just exit early to avoid crashes.
> + *
> + * We don't need to communicate that the
> + * request list changed state during error
> + * state capture and that the error state is
> + * slightly incorrect as a consequence since we
> + * are typically only interested in the request
> + * list state at the point of error state
> + * capture, not in any changes happening during
> + * the capture.
> + */
> + break;
> + }
> +
> erq = &error->ring[i].requests[count++];
> erq->seqno = request->seqno;
> erq->jiffies = request->emitted_jiffies;
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list