[Intel-gfx] [PATCH v2] drm/i915: Stop automatically retiring requests after a GPU hang
Mika Kuoppala
mika.kuoppala at linux.intel.com
Fri May 13 10:48:55 UTC 2016
Chris Wilson <chris at chris-wilson.co.uk> writes:
> [ text/plain ]
> Following a GPU hang, we break out of the request loop in order to
> unlock the struct_mutex for use by the GPU reset. However, if we retire
> all the requests at that moment, we cannot identify the guilty request
> after performing the reset.
>
> v2: Not automatically retiring requests forces us to recheck for
> available ringspace.
>
> Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
> Testcase: igt/gem_reset_stats
Testcase: igt/gem_reset_stats/ban-*
Tested-by: Mika Kuoppala <mika.kuoppala at intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 8 ++++++--
> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6e61738fab31..a3d826bb216b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1462,7 +1462,10 @@ i915_wait_request(struct drm_i915_gem_request *req)
> if (ret)
> return ret;
>
> - __i915_gem_request_retire__upto(req);
> + /* If the GPU hung, we want to keep the requests to find the guilty. */
> + if (req->reset_counter == i915_reset_counter(&dev_priv->gpu_error))
> + __i915_gem_request_retire__upto(req);
> +
> return 0;
> }
>
> @@ -1519,7 +1522,8 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
> else if (obj->last_write_req == req)
> i915_gem_object_retire__write(obj);
>
> - __i915_gem_request_retire__upto(req);
> + if (req->reset_counter == i915_reset_counter(&req->i915->gpu_error))
> + __i915_gem_request_retire__upto(req);
> }
>
> /* A nonblocking variant of the above wait. This is a highly dangerous routine
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0618dd34c3ec..8d35a3978f9b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2450,6 +2450,8 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
> return ret;
>
> intel_ring_update_space(ringbuf);
> + if (unlikely(ringbuf->space < wait_bytes))
> + return -EAGAIN;
> }
>
> if (unlikely(need_wrap)) {
> --
> 2.8.1
More information about the Intel-gfx
mailing list