[Intel-gfx] [PATCH 3/8] drm/i915: Cope with request list state change during error state capture

Tomas Elf tomas.elf at intel.com
Fri Oct 9 04:45:07 PDT 2015


On 09/10/2015 09:28, Daniel Vetter wrote:
> On Thu, Oct 08, 2015 at 07:31:35PM +0100, Tomas Elf wrote:
>> Since we're not synchronizing the ring request list during error state capture
>> the request list state might change between the time the corresponding error
>> request list was allocated and dimensioned to the time when the ring request
>> list is actually captured into the error state. If this happens, throw a
>> WARNING and do early exit and be aware that the captured error state might not
>> be fully reliable.
>
> Please don't throw a WARNING since this is expected to occasionally
> happen. DRM_DEBUG_DRIVER is enough imo.
> -Daniel

I still don't see how it could happen without leading to reads of 
unallocated memory. The error state request list has been allocated to a 
certain size equal to num_requests and this loop seems to assume that 
the error state request list maintains the same size as the driver 
request list, which is not the case - leading to crashes, which is how I 
happened to notice it.

I can obviously remove the warning but are you saying we shouldn't even 
take action if it happens? Such as early exit?

Thanks,
Tomas

>
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 32c1799..cc75ca4 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1071,6 +1071,19 @@ static void i915_gem_record_rings(struct drm_device *dev,
>>   		list_for_each_entry_safe(request, tmpreq, &ring->request_list, list) {
>>   			struct drm_i915_error_request *erq;
>>
>> +			if (WARN_ON(!request || count >= error->ring[i].num_requests)) {
>> +				/*
>> +				 * If the ring request list was changed in
>> +				 * between the point where the error request
>> +				 * list was created and dimensioned and this
>> +				 * point then just update the num_requests
>> +				 * field to reflect this.
>> +				 */
>> +				error->ring[i].num_requests =
>> +					min(count, error->ring[i].num_requests);
>> +				break;
>> +			}
>> +
>>   			erq = &error->ring[i].requests[count++];
>>   			erq->seqno = request->seqno;
>>   			erq->jiffies = request->emitted_jiffies;
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>



More information about the Intel-gfx mailing list