[Intel-gfx] [RFC 0/2] Excessive latency in GuC send action

John.C.Harrison at Intel.com John.C.Harrison at Intel.com
Sat Nov 18 00:39:32 UTC 2017


From: John Harrison <John.C.Harrison at Intel.com>

While working on a customer project, it was noticed that the time
taken to issue a pre-emption request to the GuC would vary quite
significantly. The correct case was low microseconds but the worst
case was tens of milliseconds. Two separate issues were identified as
causing this extra delay. The acquisition of the GuC send mutex lock
and the stall waiting for an ack from the GuC. Both of these are
necessary steps because the access to the send action mechanism must
be serialised. It is not designed for multiple concurrent operations.

Mutex Lock Delay:
The first issue seems to be with the way mutex_lock() itself is
implemented in the linux kernel. Even when the lock is available, it
seems that linux sometimes decides to do a task switch anyway and not
re-schedule the thread that requested the lock for some time. Hence
the time taken to acquire the mutex lock would generally be
essentially zero but in around 0.05% of cases it would be in the
6-10ms range. Note that there was never any actual contention of the
mutex observed. In all cases, the mutex was available.

The workaround implemented for the customer was to first use a
mutex_trylock(). If the try fails then it falls back to the regular
mutex_lock() call instead. As the try lock is not permitted to stall,
it never hits the random linux task switch issue. Running overnight,
3.2 million pre-emption events were logged with not a single mutex
delay over 10 microseconds (which was the granularity of the
measurement being used).

Ack Delay:
The second issue was with the wait for the GuC acknowledgment of the
send action command. In most cases, no delay is required - the
completion flag is set by the time of the first read. However, around
1% of the time the flag was not set and the wait_for() loop then took
in the region of 8-10ms to return. Analysis showed that this delay was
also entirely due to the linux kernel task switching away due to using
a sleeping wait in the polling loop. Using a non-sleeping wait brought
the latency down to the 10-20us range in the majority of 'slow' cases.
Certain tests had a noticeable spike of about 120us, presumably when
the delay was actually due to the GuC being busy. The worse extremes
measured were around 200us, with maybe one or two hits of 300us in an
overnight run.

The polling construct being used was a 'wait_for_us()' of 10
microseconds (spinning poll) followed by a 'wait_for()' of 10ms
(sleeping poll) with the final fall back being to give up and return
an error. As the command quite frequently takes just over 10us to
complete, this meant the code would regularly drop into the second
loop. Despite the sleep time being set for 1ms, it would actually
sleep for an average of 1.5ms and as noted would commonly hit closer
to 10ms.

The workaround was to extend the busy poll period to 100us in the case
of a pre-emption request. In the case of any other GuC send action
command, the original 10us was kept. The reasoning being that a
pre-emption request is a very high priority, urgent thing and is worth
burning the extra CPU cycles on a busy poll in order to guarantee a
suitably small latency. Whereas, the other send action commands are
not performance critical and a longer but more system friendly sleep
is not an issue.


Note that the two patches included here are based against the
customer's 4.11 tree. So they might not apply too well the current
upstream pre-emption implementation. It was noted that a recent update
to the gem_exec_latency IGT showed "additional latency in the mutex
for guc preemption". Which sounds like the same issue that was seen
here. Hence posting what I have to a wider audience.

John Harrison (2):
  drm/i915: Extend GuC action fast spin time
  drm/i915: Avoid stalling on GuC send mutex lock

 drivers/gpu/drm/i915/i915_guc_submission.c | 12 ++++++------
 drivers/gpu/drm/i915/intel_guc_log.c       |  6 +++---
 drivers/gpu/drm/i915/intel_huc.c           |  2 +-
 drivers/gpu/drm/i915/intel_uc.c            | 23 ++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_uc.h            |  2 +-
 5 files changed, 29 insertions(+), 16 deletions(-)

-- 
2.13.0



More information about the Intel-gfx mailing list