[Intel-gfx] [RFC 0/2] Excessive latency in GuC send action

Mon Nov 20 10:49:42 UTC 2017

On Fri, Nov 17, 2017 at 04:39:32PM -0800, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> While working on a customer project, it was noticed that the time
> taken to issue a pre-emption request to the GuC would vary quite
> significantly. The correct case was low microseconds but the worst
> case was tens of milliseconds. Two separate issues were identified as
> causing this extra delay. The acquisition of the GuC send mutex lock
> and the stall waiting for an ack from the GuC. Both of these are
> necessary steps because the access to the send action mechanism must
> be serialised. It is not designed for multiple concurrent operations.
> 
> Mutex Lock Delay:
> The first issue seems to be with the way mutex_lock() itself is
> implemented in the linux kernel. Even when the lock is available, it
> seems that linux sometimes decides to do a task switch anyway and not
> re-schedule the thread that requested the lock for some time. Hence
> the time taken to acquire the mutex lock would generally be
> essentially zero but in around 0.05% of cases it would be in the
> 6-10ms range. Note that there was never any actual contention of the
> mutex observed. In all cases, the mutex was available.
> 
> The workaround implemented for the customer was to first use a
> mutex_trylock(). If the try fails then it falls back to the regular
> mutex_lock() call instead. As the try lock is not permitted to stall,
> it never hits the random linux task switch issue. Running overnight,
> 3.2 million pre-emption events were logged with not a single mutex
> delay over 10 microseconds (which was the granularity of the
> measurement being used).

This sounds like a core mutex implementation bug. Have you talked with
relevant maintainers (some work for Intel even)?
-Daniel

> Ack Delay:
> The second issue was with the wait for the GuC acknowledgment of the
> send action command. In most cases, no delay is required - the
> completion flag is set by the time of the first read. However, around
> 1% of the time the flag was not set and the wait_for() loop then took
> in the region of 8-10ms to return. Analysis showed that this delay was
> also entirely due to the linux kernel task switching away due to using
> a sleeping wait in the polling loop. Using a non-sleeping wait brought
> the latency down to the 10-20us range in the majority of 'slow' cases.
> Certain tests had a noticeable spike of about 120us, presumably when
> the delay was actually due to the GuC being busy. The worse extremes
> measured were around 200us, with maybe one or two hits of 300us in an
> overnight run.
> 
> The polling construct being used was a 'wait_for_us()' of 10
> microseconds (spinning poll) followed by a 'wait_for()' of 10ms
> (sleeping poll) with the final fall back being to give up and return
> an error. As the command quite frequently takes just over 10us to
> complete, this meant the code would regularly drop into the second
> loop. Despite the sleep time being set for 1ms, it would actually
> sleep for an average of 1.5ms and as noted would commonly hit closer
> to 10ms.
> 
> The workaround was to extend the busy poll period to 100us in the case
> of a pre-emption request. In the case of any other GuC send action
> command, the original 10us was kept. The reasoning being that a
> pre-emption request is a very high priority, urgent thing and is worth
> burning the extra CPU cycles on a busy poll in order to guarantee a
> suitably small latency. Whereas, the other send action commands are
> not performance critical and a longer but more system friendly sleep
> is not an issue.
> 
> 
> Note that the two patches included here are based against the
> customer's 4.11 tree. So they might not apply too well the current
> upstream pre-emption implementation. It was noted that a recent update
> to the gem_exec_latency IGT showed "additional latency in the mutex
> for guc preemption". Which sounds like the same issue that was seen
> here. Hence posting what I have to a wider audience.
> 
> John Harrison (2):
>   drm/i915: Extend GuC action fast spin time
>   drm/i915: Avoid stalling on GuC send mutex lock
> 
>  drivers/gpu/drm/i915/i915_guc_submission.c | 12 ++++++------
>  drivers/gpu/drm/i915/intel_guc_log.c       |  6 +++---
>  drivers/gpu/drm/i915/intel_huc.c           |  2 +-
>  drivers/gpu/drm/i915/intel_uc.c            | 23 ++++++++++++++++++-----
>  drivers/gpu/drm/i915/intel_uc.h            |  2 +-
>  5 files changed, 29 insertions(+), 16 deletions(-)
> 
> -- 
> 2.13.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch