[Intel-gfx] [PATCH v10b 4/6] drm/i915: Interrupt driven fences

Tue Jun 21 10:44:53 UTC 2016

On 17/06/16 12:05, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler (via a deferred work queue)
> then scans through the 'poke me' list when a new seqno pops out and
> signals any matching fence/request. The fence is then removed from the
> list so the entire request stack does not need to be scanned every
> time. The fence is added to the list before the commands to generate
> the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoids any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> v8: Updated to newer nightly - changes to request clean up code mean
> non of the deferred free mess is needed any more.
>
> v9: Moved the request completion processing out of the interrupt
> handler and into a worker thread (Chris Wilson).
>
> v10: Changed to an un-ordered work queue to allow parallel processing
> of different engines. Also set the high priority flag for reduced
> latency. Removed some unnecessary checks for invalid seqno values.
> Improved/added some comments and WARNs. Moved a spinlock release a few
> lines later to make the 'locked' parameter of
> i915_gem_request_enable_interrupt redundant and removed it. Also
> shuffled the function around in the file so as to make it static and
> remove it from the header file. Corrected the use of
> fence_signal_locked() to fence_signal() in the retire code. Dropped
> the irq save part of the spin lock calls in the notify code as this is
> no longer called from the ISR. Changed the call of
> i915_gem_retire_requests_ring() in the reset cleanup code to
> i915_gem_request_notify() instead as the former is just duplicating a
> lot of operations.
> [Review comments from Maarten Lankhorst & Tvrtko Ursulin]
>
> Made the call to _notify() from _retire_requests_ring() conditional on
> interrupts not being enabled as it is only a race condition work
> around for that case. Also re-instated the lazy_coherency flag (but
> now on the _notify() function) to reduce the overhead of
> _retire_requests_ring() calling _notify() lots and lots (even with the
> anti-interrupt check).
>
> Updated for yet more nightly changes (u64 for fence context).
>
> v10b: Re-ordered the fence signal and IRQ release in _notify() to fix a race
> condition when disabling interrupts. Also, moved the wake_up_all()
> call from the IRQ handler to the worker thread to prevent the wake up
> of waiting threads from overtaking the signalling of the request.

I was concerned by the second part of this change which will increase 
the wake-up latency for the waiters and has asked John do do some quick 
low-level (gem_latency) testing.

Preliminary results were a bit strange with small batches experiencing 
the expected slowdown but large one being significantly faster.

Since he is out this week I will try and run some more benchmarks on this.

To re-iterate, concern is moving the wake_up_all(&engine->irq_queue) 
from notify_ring (hard irq) to the fence notify worker. This adds one 
additional scheduling wakeup latency to the waiters which use the i915 API.

Regards,

Tvrtko