[Intel-gfx] [RFC 7/9] drm/i915: Interrupt driven fences

John Harrison John.C.Harrison at Intel.com
Wed Oct 28 06:00:39 PDT 2015


On 27/07/2015 12:33, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 07/17/2015 03:31 PM, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> The intended usage model for struct fence is that the signalled 
>> status should be
>> set on demand rather than polled. That is, there should not be a need 
>> for a
>> 'signaled' function to be called everytime the status is queried. 
>> Instead,
>> 'something' should be done to enable a signal callback from the 
>> hardware which
>> will update the state directly. In the case of requests, this is the 
>> seqno
>> update interrupt. The idea is that this callback will only be enabled 
>> on demand
>> when something actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the 
>> callback scheme.
>> Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 
>> 'poke me' list
>> when a new seqno pops out and signals any matching fence/request. The 
>> fence is
>> then removed from the list so the entire request stack does not need 
>> to be
>> scanned every time. Note that the fence is added to the list before 
>> the commands
>> to generate the seqno interrupt are added to the ring. Thus the 
>> sequence is
>> guaranteed to be race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when 
>> __wait_request() is
>> called). Thus there is still a potential race when enabling the 
>> interrupt as the
>> request may already have completed. However, this is simply solved by 
>> calling
>> the interrupt processing code immediately after enabling the 
>> interrupt and
>> thereby checking for already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel outstanding
>> requests (e.g. because TDR has reset the ring). These requests will 
>> never get
>> signalled and so must be removed from the signal list manually. This 
>> is done by
>> setting a 'cancelled' flag and then calling the regular notify/retire 
>> code path
>> rather than attempting to duplicate the list manipulatation and clean 
>> up code in
>> multiple places. This also avoid any race condition where the 
>> cancellation
>> request might occur after/during the completion interrupt actually 
>> arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer 
>> requiring the
>> mutex lock.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |   8 ++
>>   drivers/gpu/drm/i915/i915_gem.c         | 132 
>> +++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>>   6 files changed, 136 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 61c3db2..d7f1aa5 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>   struct drm_i915_gem_request {
>>       /** Underlying object for implementing the signal/wait stuff. */
>>       struct fence fence;
>> +    struct list_head signal_list;
>> +    struct list_head unsignal_list;
>
> In addition to what Daniel said (one list_head looks enough) it is 
> customary to call it _link.
>
>>       struct list_head delay_free_list;
>> +    bool cancelled;
>> +    bool irq_enabled;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring);
>> +
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct intel_context *ctx,
>>                      struct intel_engine_cs *ring);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 482835a..7c589a9 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1222,6 +1222,11 @@ int __i915_wait_request(struct 
>> drm_i915_gem_request *req,
>>       if (list_empty(&req->list))
>>           return 0;
>>
>> +    /*
>> +     * Enable interrupt completion of the request.
>> +     */
>> +    i915_gem_request_enable_interrupt(req);
>> +
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct 
>> drm_i915_gem_request *request)
>>       list_del_init(&request->list);
>>       i915_gem_request_remove_from_client(request);
>>
>> +    /* In case the request is still in the signal pending list */
>> +    if (!list_empty(&request->signal_list))
>> +        request->cancelled = true;
>> +
>>       i915_gem_request_unreference(request);
>>   }
>>
>> @@ -2534,6 +2543,12 @@ void __i915_add_request(struct 
>> drm_i915_gem_request *request,
>>        */
>>       request->postfix = intel_ring_get_tail(ringbuf);
>>
>> +    /*
>> +     * Add the fence to the pending list before emitting the 
>> commands to
>> +     * generate a seqno notification interrupt.
>> +     */
>> +    i915_gem_request_submit(request);
>> +
>>       if (i915.enable_execlists)
>>           ret = ring->emit_request(request);
>>       else {
>> @@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct 
>> drm_i915_gem_request *req)
>>           i915_gem_context_unreference(ctx);
>>       }
>>
>> +    if (req->irq_enabled)
>> +        req->ring->irq_put(req->ring);
>> +
>
> We get here with interrupts still enabled only if userspace is 
> abandoning a wait on an unsignaled fence, did I get that right?
It implies the request has been abandoned in some manner, yes. E.g. TDR 
has killed it, user space has given up, ...

>
>> kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> @@ -2668,24 +2686,105 @@ static const char 
>> *i915_gem_request_get_timeline_name(struct fence *req_fence)
>>       return req->ring->name;
>>   }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request has been submitted to the hardware so add the fence 
>> to the
>> + * list of signalable fences.
>> + *
>> + * NB: This does not enable interrupts yet. That only occurs on 
>> demand when
>> + * the request is actually waited on. However, adding it to the list 
>> early
>> + * ensures that there is no race condition where the interrupt could 
>> pop
>> + * out prematurely and thus be completely lost. The race is merely 
>> that the
>> + * interrupt must be manually checked for after being enabled.
>> + */
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>   {
>> -    /* Interrupt driven fences are not implemented yet.*/
>> -    WARN(true, "This should not be called!");
>> -    return true;
>> +    fence_enable_sw_signaling(&req->fence);
>>   }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req)
>> +{
>> +    if (req->irq_enabled)
>> +        return;
>> +
>> +    WARN_ON(!req->ring->irq_get(req->ring));
>> +    req->irq_enabled = true;
>
> req->irq_enabled manipulations look racy. Here and in request free it 
> is protected by struct_mutex, but that is not held in 
> i915_gem_request_notify. Initial feeling is you should use 
> ring->fence_lock everyplace you query/manipulate req->irq_enabled.

The only asynchronous access is from _notify() which disables IRQs if 
the flag is set and then clears it. That can't race with the enable 
because the enable only sets the flag after setting IRQs on. The worst 
that can happen on a race is that IRQs are enabled and then immediately 
disabled - truly concurrent execution would result in one test or the 
other failing and so only one code path would be taken. The only other 
usage is in _request_free() but that can only run when the last 
reference has been dropped and that means it is no longer on any list 
that _notify() can see.


>> +
>> +    /*
>> +     * Because the interrupt is only enabled on demand, there is a race
>> +     * where the interrupt can fire before anyone is looking for it. So
>> +     * do an explicit check for missed interrupts.
>> +     */
>> +    i915_gem_request_notify(req->ring);
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +
>> +    i915_gem_request_reference(req);
>> +    WARN_ON(!list_empty(&req->signal_list));
>
> It looks very unsafe to proceed normally after this WARN_ON. It should 
> probably return false here to preserve data structure sanity.
This really should be a BUG_ON but Daniel doesn't like those. It should 
be an impossible code path and not something that can be hit by the user 
being dumb. Anyway, this code has all been changed in the latest 
incarnation.


>
>> + list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
>> +
>> +    /*
>> +     * Note that signalling is always enabled for every request before
>> +     * that request is submitted to the hardware. Therefore there is
>> +     * no race condition whereby the signal could pop out before the
>> +     * request has been added to the list. Hence no need to check
>> +     * for completion, undo the list add and return false.
>> +     *
>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>> +     * race where the request could complete before the interrupt has
>> +     * been enabled. Thus care must be taken at that point.
>> +     */
>> +
>> +    return true;
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *ring)
>> +{
>> +    struct drm_i915_gem_request *req, *req_next;
>> +    unsigned long flags;
>>       u32 seqno;
>> +    LIST_HEAD(free_list);
>>
>> -    BUG_ON(req == NULL);
>> +    if (list_empty(&ring->fence_signal_list))
>> +        return;
>> +
>> +    seqno = ring->get_seqno(ring, false);
>> +
>> +    spin_lock_irqsave(&ring->fence_lock, flags);
>> +    list_for_each_entry_safe(req, req_next, 
>> &ring->fence_signal_list, signal_list) {
>> +        if (!req->cancelled) {
>> +            if (!i915_seqno_passed(seqno, req->seqno))
>> +                continue;
>>
>> -    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +            fence_signal_locked(&req->fence);
>> +        }
>> +
>> +        list_del_init(&req->signal_list);
>
> I haven't managed to figure out why is this apparently removing 
> requests which have not been signalled from the signal_list?
>
> Shouldn't they be moved to free_list only if i915_seqno_passed?
Requests are only removed from the signal list if either a) the seqno 
has passed or b) the request has been cancelled and thus will never 
actually complete. Not sure what other scenario you are seeing.

>
>> +        if (req->irq_enabled) {
>> +            req->ring->irq_put(req->ring);
>> +            req->irq_enabled = false;
>> +        }
>>
>> -    return i915_seqno_passed(seqno, req->seqno);
>> +        /* Can't unreference here because that might grab fence_lock */
>> +        list_add_tail(&req->unsignal_list, &free_list);
>> +    }
>> +    spin_unlock_irqrestore(&ring->fence_lock, flags);
>> +
>> +    /* It should now be safe to actually free the requests */
>> +    while (!list_empty(&free_list)) {
>> +        req = list_first_entry(&free_list,
>> +                       struct drm_i915_gem_request, unsignal_list);
>> +        list_del(&req->unsignal_list);
>> +
>> +        i915_gem_request_unreference(req);
>> +    }
>>   }
>>
>>   static void i915_fence_timeline_value_str(struct fence *fence, char 
>> *str, int size)
>> @@ -2711,7 +2810,6 @@ static const struct fence_ops 
>> i915_gem_request_fops = {
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>>       .get_timeline_name    = i915_gem_request_get_timeline_name,
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>> -    .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_release,
>>       .fence_value_str    = i915_fence_value_str,
>> @@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>           goto err;
>>       }
>>
>> +    INIT_LIST_HEAD(&req->signal_list);
>>       fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>> ctx->engine[ring->id].fence_timeline.fence_context,
>> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>> @@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct 
>> drm_i915_private *dev_priv,
>>
>>           i915_gem_request_retire(request);
>>       }
>> +
>> +    /*
>> +     * Make sure any requests that were on the signal pending list get
>> +     * cleaned up.
>> +     */
>> +    i915_gem_request_notify(ring);
>> +    i915_gem_retire_requests_ring(ring);
>
> Would i915_gem_retire_requests_ring be enough given how it calls 
> i915_gem_request_notify itself as the first thing below?
>
Oops, left over from before retire called notify explicitly.

>>   }
>>
>>   void i915_gem_restore_fences(struct drm_device *dev)
>> @@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct 
>> intel_engine_cs *ring)
>>   {
>>       WARN_ON(i915_verify_lists(ring->dev));
>>
>> +    /*
>> +     * If no-one has waited on a request recently then interrupts will
>> +     * not have been enabled and thus no requests will ever be 
>> marked as
>> +     * completed. So do an interrupt check now.
>> +     */
>> +    i915_gem_request_notify(ring);
>> +
>>       /* Retire requests first as we use it above for the early return.
>>        * If we retire requests last, we may use a later seqno and so 
>> clear
>>        * the requests lists without clearing the active list, leading to
>> @@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
>>   {
>>       INIT_LIST_HEAD(&ring->active_list);
>>       INIT_LIST_HEAD(&ring->request_list);
>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c 
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index d87f173..e446509 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs 
>> *ring)
>>
>>       trace_i915_gem_request_notify(ring);
>>
>> +    i915_gem_request_notify(ring);
>> +
>
> How many requests are typically on signal_list on some typical 
> workloads? This could be a significant performance change since on 
> every user interrupt it would talk it all potentially only removing 
> one request at a time.
>
Obviously, some of the IGT tests can produce very large request lists 
(e.g. gem_exec_nop) but running 'normal' stuff rarely seems to generate 
a long list. E.g. running GLBench + GLXGears on an Ubuntu desktop I get 
95% of the time there are ten or fewer requests in the list but the loop 
iterates only once (49%) or twice (46%) because the first request gets 
signalled and the second (if present) aborts the loop.

The biggest problem seems to be that the hardware is brain dead with 
respect to generating interrupts. So if two requests complete in quick 
succession and the ISR only gets to see the second seqno, it still gets 
called a second time. Thus we actually get a ridiculous figure of 60% of 
ISR calls being no-ops because the seqno has not actually advanced. The 
code now checks for duplicate seqnos and early exits. Not sure how to 
get rid of the call completely.


> These are just review comments on this particular patch without 
> thinking yet of the bigger design questions Daniel has raised.
>
> Regards,
>
> Tvrtko



More information about the Intel-gfx mailing list