[Intel-gfx] [PATCH 3/4] drm/i915: Interrupt driven fences

Chris Wilson chris at chris-wilson.co.uk
Fri Jun 26 06:34:05 PDT 2015


On Fri, Jun 26, 2015 at 01:58:11PM +0100, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
> 
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
> 
> One complication here is that the 'poke me' system requires holding a reference
> count on the request to guarantee that it won't be freed prematurely.
> Unfortunately, it is unsafe to decrement the reference count from the interrupt
> handler because if that is the last reference, the clean up code gets run and
> the clean up code is not IRQ friendly. Hence, the request is added to a 'please
> clean me' list that gets processed at retire time. Any request in this list
> simply has its count decremented and is then removed from that list.
> 
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
> 
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.

-nightly nop:
Time to exec x 1:                15.000µs (ring=render)
Time to exec x 1:                 2.000µs (ring=blt)
Time to exec x 131072:            1.827µs (ring=render)
Time to exec x 131072:            1.555µs (ring=blt)

rq tuning patches nop:
Time to exec x 1:		 12.200µs (ring=render)
Time to exec x 1:		  1.600µs (ring=blt)
Time to exec x 131072:		  1.516µs (ring=render)
Time to exec x 131072:		  0.812µs (ring=blt)

interrupt driven nop:
Time to exec x 1:		 19.200µs (ring=render)
Time to exec x 1:		  5.200µs (ring=blt)
Time to exec x 131072:		  2.381µs (ring=render)
Time to exec x 131072:		  2.009µs (ring=blt)

So the basic question that is left unanswered from last time is why
would we want to slow down __i915_wait_request? And enabling IRQs still
generates very high system load when processing the 30-40k IRQs per
second found under some workloads.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list