[Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

Wed Jan 27 05:13:54 PST 2016

On 27/01/16 09:38, Chris Wilson wrote:
> On Wed, Jan 27, 2016 at 08:55:40AM +0000, daniele.ceraolospurio at intel.com wrote:
>> From: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>
>> While running some tests on the scheduler patches with rpm enabled I
>> came across a corruption in the ringbuffer, which was root-caused to
>> the GPU being suspended while commands were being emitted to the
>> ringbuffer. The access to memory was failing because the GPU needs to
>> be awake when accessing stolen memory (where my ringbuffer was located).
>> Since we have this constraint it looks like a sensible idea to check that
>> we hold a refcount when we emit commands.
>>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 3761eaf..f9e8d74 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>>   	if (ret)
>>   		return ret;
>>   
>> +	// If the ringbuffer is in stolen memory we need to be sure that the
>> +	// gpu is awake before writing to it
>> +	if (req->ringbuf->obj->stolen && num_dwords > 0)
>> +		assert_rpm_wakelock_held(dev_priv);
> The assertion you want is that when iomapping through the GTT that we
> hold a wakeref.
> -Chris

If I'm not missing anything, we iomap the ringbuffer at request 
allocation time; however, with the scheduler a request could potentially 
wait in the queue for a time long enough to allow RPM to kick in, 
especially if the request is waiting on a fence object coming from a 
different driver. In this situation the rpm reference taken to cover the 
request allocation would have already been released and so we need to 
ensure that a new one has been taken before writing to the ringbuffer; 
that's why I originally placed the assert in ring_begin.
Scheduler code is still in review anyway and subjected to change, so I 
guess that until that reaches its final form there is no point in 
debating where to put a possible second assert :-)

I'll respin the patch with the assert at iomap time as you suggested.

Thanks,
Daniele