[Intel-gfx] [PATCH 3/8] drm/i915: Cope with request list state change during error state capture

Fri Oct 9 04:25:26 PDT 2015

On 09/10/2015 08:48, Chris Wilson wrote:
> On Thu, Oct 08, 2015 at 07:31:35PM +0100, Tomas Elf wrote:
>> Since we're not synchronizing the ring request list during error state capture
>> the request list state might change between the time the corresponding error
>> request list was allocated and dimensioned to the time when the ring request
>> list is actually captured into the error state. If this happens, throw a
>> WARNING and do early exit and be aware that the captured error state might not
>> be fully reliable.
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 32c1799..cc75ca4 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1071,6 +1071,19 @@ static void i915_gem_record_rings(struct drm_device *dev,
>>   		list_for_each_entry_safe(request, tmpreq, &ring->request_list, list) {
>>   			struct drm_i915_error_request *erq;
>>
>> +			if (WARN_ON(!request || count >= error->ring[i].num_requests)) {
>
> Request cannot be null, count can legitmately be more, the WARN on is
> inappropriate. Again, I sent several patches over the past couple of
> years to fix this.
> -Chris
>

Ok, so having the driver request list change grow in between the point 
where we allocate and set up the error state request list to the same 
size as the driver request list (since that's what count being larger 
than the list size implies) is legitimate? Traversing into unallocated 
memory seems pretty dodgy to me but if you say so.

Do you mind pointing me to your patch that solves this your way?

Thanks,
Tomas