[Intel-gfx] [PATCH 3/8] drm/i915: Cope with request list state change during error state capture

Fri Oct 9 00:48:00 PDT 2015

On Thu, Oct 08, 2015 at 07:31:35PM +0100, Tomas Elf wrote:
> Since we're not synchronizing the ring request list during error state capture
> the request list state might change between the time the corresponding error
> request list was allocated and dimensioned to the time when the ring request
> list is actually captured into the error state. If this happens, throw a
> WARNING and do early exit and be aware that the captured error state might not
> be fully reliable.
> 
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 32c1799..cc75ca4 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1071,6 +1071,19 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  		list_for_each_entry_safe(request, tmpreq, &ring->request_list, list) {
>  			struct drm_i915_error_request *erq;
>  
> +			if (WARN_ON(!request || count >= error->ring[i].num_requests)) {

Request cannot be null, count can legitmately be more, the WARN on is
inappropriate. Again, I sent several patches over the past couple of
years to fix this.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre