[Intel-gfx] [PATCH v3] drm/i915: Move CSB MMIO reads out of the execlists lock

Thu Mar 17 16:30:47 UTC 2016

On 17/03/16 13:14, Chris Wilson wrote:
> On Thu, Mar 17, 2016 at 12:59:46PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> By reading the CSB (slow MMIO accesses) into a temporary local
>> buffer we can decrease the duration of holding the execlist
>> lock.
>>
>> Main advantage is that during heavy batch buffer submission we
>> reduce the execlist lock contention, which should decrease the
>> latency and CPU usage between the submitting userspace process
>> and interrupt handling.
>>
>> Downside is that we need to grab and relase the forcewake twice,
>> but as the below numbers will show this is completely hidden
>> by the primary gains.
>>
>> Testing with "gem_latency -n 100" (submit batch buffers with a
>> hundred nops each) shows more than doubling of the throughput
>> and more than halving of the dispatch latency, overall latency
>> and CPU time spend in the submitting process.
>>
>> Submitting empty batches ("gem_latency -n 0") does not seem
>> significantly affected by this change with throughput and CPU
>> time improving by half a percent, and overall latency worsening
>> by the same amount.
>>
>> Above tests were done in a hundred runs on a big core Broadwell.
>>
>> v2:
>>    * Overflow protection to local CSB buffer.
>>    * Use closer dev_priv in execlists_submit_requests. (Chris Wilson)
>>
>> v3: Rebase.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 77 ++++++++++++++++++++--------------------
>>   1 file changed, 38 insertions(+), 39 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index f72782200226..c6b3a9d19af2 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -416,15 +416,23 @@ static void execlists_update_context(struct drm_i915_gem_request *rq)
>>   static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
>>   				      struct drm_i915_gem_request *rq1)
>>   {
>> +	struct drm_i915_private *dev_priv = rq0->i915;
>> +
>>   	execlists_update_context(rq0);
>>
>>   	if (rq1)
>>   		execlists_update_context(rq1);
>>
>> +	spin_lock(&dev_priv->uncore.lock);
>> +	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
>
> My only problem with this is that it puts the onus on the caller to
> remember to disable irqs first. (That would be a silent conflict with
> the patch to move the submission to a kthread for example.)

Oh well in this code ones has to know what they are doing anyway so I 
consider this very minor.

> What's the cost of being upfront with spin_lock_irqsave() here?

No idea but probably not much if at all detectable. Issue I have with 
that that elsewhere we have this approach of using the exact right one 
for the use case.

> /* BUG_ON(!irqs_disabled());  */ ?

As a comment?

> I'm quite happy to go with the comment here and since it doesn't make
> the lockups any worse,
> Reviewed-by: Chris Wilsno <chris at chris-wilson.co.uk>

Thanks, will add that comment.

Regards,

Tvrtko