[Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Wed Jan 27 05:13:54 PST 2016
On 27/01/16 09:38, Chris Wilson wrote:
> On Wed, Jan 27, 2016 at 08:55:40AM +0000, daniele.ceraolospurio at intel.com wrote:
>> From: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>
>> While running some tests on the scheduler patches with rpm enabled I
>> came across a corruption in the ringbuffer, which was root-caused to
>> the GPU being suspended while commands were being emitted to the
>> ringbuffer. The access to memory was failing because the GPU needs to
>> be awake when accessing stolen memory (where my ringbuffer was located).
>> Since we have this constraint it looks like a sensible idea to check that
>> we hold a refcount when we emit commands.
>>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 3761eaf..f9e8d74 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
>> if (ret)
>> return ret;
>>
>> + // If the ringbuffer is in stolen memory we need to be sure that the
>> + // gpu is awake before writing to it
>> + if (req->ringbuf->obj->stolen && num_dwords > 0)
>> + assert_rpm_wakelock_held(dev_priv);
> The assertion you want is that when iomapping through the GTT that we
> hold a wakeref.
> -Chris
If I'm not missing anything, we iomap the ringbuffer at request
allocation time; however, with the scheduler a request could potentially
wait in the queue for a time long enough to allow RPM to kick in,
especially if the request is waiting on a fence object coming from a
different driver. In this situation the rpm reference taken to cover the
request allocation would have already been released and so we need to
ensure that a new one has been taken before writing to the ringbuffer;
that's why I originally placed the assert in ring_begin.
Scheduler code is still in review anyway and subjected to change, so I
guess that until that reaches its final form there is no point in
debating where to put a possible second assert :-)
I'll respin the patch with the assert at iomap time as you suggested.
Thanks,
Daniele
More information about the Intel-gfx
mailing list